---
_id: '11744'
abstract:
- lang: eng
text: A noise power spectral density (PSD) estimation is an indispensable component
of speech spectral enhancement systems. In this paper we present a noise PSD tracking
algorithm, which employs a noise presence probability estimate delivered by a
deep neural network (DNN). The algorithm provides a causal noise PSD estimate
and can thus be used in speech enhancement systems for communication purposes.
An extensive performance comparison has been carried out with ten causal state-of-the-art
noise tracking algorithms taken from the literature and categorized acc. to applied
techniques. The experiments showed that the proposed DNN-based noise PSD tracker
outperforms all competing methods with respect to all tested performance measures,
which include the noise tracking performance and the performance of a speech enhancement
system employing the noise tracking component.
author:
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Chinaev A, Heymann J, Drude L, Haeb-Umbach R. Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs. In: 12. ITG Fachtagung Sprachkommunikation
(ITG 2016). ; 2016.'
apa: Chinaev, A., Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs. In 12. ITG Fachtagung Sprachkommunikation
(ITG 2016).
bibtex: '@inproceedings{Chinaev_Heymann_Drude_Haeb-Umbach_2016, title={Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs}, booktitle={12. ITG Fachtagung Sprachkommunikation
(ITG 2016)}, author={Chinaev, Aleksej and Heymann, Jahn and Drude, Lukas and Haeb-Umbach,
Reinhold}, year={2016} }'
chicago: Chinaev, Aleksej, Jahn Heymann, Lukas Drude, and Reinhold Haeb-Umbach.
“Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” In 12.
ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
ieee: A. Chinaev, J. Heymann, L. Drude, and R. Haeb-Umbach, “Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs,” in 12. ITG Fachtagung Sprachkommunikation
(ITG 2016), 2016.
mla: Chinaev, Aleksej, et al. “Noise-Presence-Probability-Based Noise PSD Estimation
by Using DNNs.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
short: 'A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.'
date_created: 2019-07-12T05:27:25Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16.pdf
oa: '1'
publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016)
related_material:
link:
- description: Presentation
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16_Presentation.pdf
status: public
title: Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11751'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Boeddeker C, Haeb-Umbach R. Blind Speech Separation based on Complex
Spherical k-Mode Clustering. In: Proc. IEEE Intl. Conf. on Acoustics, Speech
and Signal Processing (ICASSP). ; 2016.'
apa: Drude, L., Boeddeker, C., & Haeb-Umbach, R. (2016). Blind Speech Separation
based on Complex Spherical k-Mode Clustering. In Proc. IEEE Intl. Conf. on
Acoustics, Speech and Signal Processing (ICASSP).
bibtex: '@inproceedings{Drude_Boeddeker_Haeb-Umbach_2016, title={Blind Speech Separation
based on Complex Spherical k-Mode Clustering}, booktitle={Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP)}, author={Drude, Lukas and
Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Drude, Lukas, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Blind Speech
Separation Based on Complex Spherical K-Mode Clustering.” In Proc. IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.
ieee: L. Drude, C. Boeddeker, and R. Haeb-Umbach, “Blind Speech Separation based
on Complex Spherical k-Mode Clustering,” in Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.
mla: Drude, Lukas, et al. “Blind Speech Separation Based on Complex Spherical K-Mode
Clustering.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), 2016.
short: 'L. Drude, C. Boeddeker, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.'
date_created: 2019-07-12T05:27:33Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_slides.pdf
status: public
title: Blind Speech Separation based on Complex Spherical k-Mode Clustering
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11756'
abstract:
- lang: eng
text: Although complex-valued neural networks (CVNNs) â?? networks which can operate
with complex arithmetic â?? have been around for a while, they have not been given
reconsideration since the breakthrough of deep network architectures. This paper
presents a critical assessment whether the novel tool set of deep neural networks
(DNNs) should be extended to complex-valued arithmetic. Indeed, with DNNs making
inroads in speech enhancement tasks, the use of complex-valued input data, specifically
the short-time Fourier transform coefficients, is an obvious consideration. In
particular when it comes to performing tasks that heavily rely on phase information,
such as acoustic beamforming, complex-valued algorithms are omnipresent. In this
contribution we recapitulate backpropagation in CVNNs, develop complex-valued
network elements, such as the split-rectified non-linearity, and compare real-
and complex-valued networks on a beamforming task. We find that CVNNs hardly provide
a performance gain and conclude that the effort of developing the complex-valued
counterparts of the building blocks of modern deep or recurrent neural networks
can hardly be justified.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Bhiksha
full_name: Raj, Bhiksha
last_name: Raj
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Raj B, Haeb-Umbach R. On the appropriateness of complex-valued neural
networks for speech enhancement. In: INTERSPEECH 2016, San Francisco, USA.
; 2016.'
apa: Drude, L., Raj, B., & Haeb-Umbach, R. (2016). On the appropriateness of
complex-valued neural networks for speech enhancement. In INTERSPEECH 2016,
San Francisco, USA.
bibtex: '@inproceedings{Drude_Raj_Haeb-Umbach_2016, title={On the appropriateness
of complex-valued neural networks for speech enhancement}, booktitle={INTERSPEECH
2016, San Francisco, USA}, author={Drude, Lukas and Raj, Bhiksha and Haeb-Umbach,
Reinhold}, year={2016} }'
chicago: Drude, Lukas, Bhiksha Raj, and Reinhold Haeb-Umbach. “On the Appropriateness
of Complex-Valued Neural Networks for Speech Enhancement.” In INTERSPEECH 2016,
San Francisco, USA, 2016.
ieee: L. Drude, B. Raj, and R. Haeb-Umbach, “On the appropriateness of complex-valued
neural networks for speech enhancement,” in INTERSPEECH 2016, San Francisco,
USA, 2016.
mla: Drude, Lukas, et al. “On the Appropriateness of Complex-Valued Neural Networks
for Speech Enhancement.” INTERSPEECH 2016, San Francisco, USA, 2016.
short: 'L. Drude, B. Raj, R. Haeb-Umbach, in: INTERSPEECH 2016, San Francisco, USA,
2016.'
date_created: 2019-07-12T05:27:39Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_paper.pdf
oa: '1'
publication: INTERSPEECH 2016, San Francisco, USA
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_slides.pdf
status: public
title: On the appropriateness of complex-valued neural networks for speech enhancement
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11771'
abstract:
- lang: eng
text: This paper is concerned with speech presence probability estimation employing
an explicit model of the temporal and spectral correlations of speech. An undirected
graphical model is introduced, based on a Factor Graph formulation. It is shown
that this undirected model cures some of the theoretical issues of an earlier
directed graphical model. Furthermore, we formulate a message passing inference
scheme based on an approximate graph factorization, identify this inference scheme
as a particular message passing schedule based on the turbo principle and suggest
further alternative schedules. The experiments show an improved performance over
speech presence probability estimation based on an IID assumption, and a slightly
better performance of the turbo schedule over the alternatives.
author:
- first_name: Thomas
full_name: Glarner, Thomas
id: '14169'
last_name: Glarner
- first_name: Mohammad
full_name: Mahdi Momenzadeh, Mohammad
last_name: Mahdi Momenzadeh
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Glarner T, Mahdi Momenzadeh M, Drude L, Haeb-Umbach R. Factor Graph Decoding
for Speech Presence Probability Estimation. In: 12. ITG Fachtagung Sprachkommunikation
(ITG 2016). ; 2016.'
apa: Glarner, T., Mahdi Momenzadeh, M., Drude, L., & Haeb-Umbach, R. (2016).
Factor Graph Decoding for Speech Presence Probability Estimation. In 12. ITG
Fachtagung Sprachkommunikation (ITG 2016).
bibtex: '@inproceedings{Glarner_Mahdi Momenzadeh_Drude_Haeb-Umbach_2016, title={Factor
Graph Decoding for Speech Presence Probability Estimation}, booktitle={12. ITG
Fachtagung Sprachkommunikation (ITG 2016)}, author={Glarner, Thomas and Mahdi
Momenzadeh, Mohammad and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016}
}'
chicago: Glarner, Thomas, Mohammad Mahdi Momenzadeh, Lukas Drude, and Reinhold Haeb-Umbach.
“Factor Graph Decoding for Speech Presence Probability Estimation.” In 12.
ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
ieee: T. Glarner, M. Mahdi Momenzadeh, L. Drude, and R. Haeb-Umbach, “Factor Graph
Decoding for Speech Presence Probability Estimation,” in 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.
mla: Glarner, Thomas, et al. “Factor Graph Decoding for Speech Presence Probability
Estimation.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
short: 'T. Glarner, M. Mahdi Momenzadeh, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.'
date_created: 2019-07-12T05:27:56Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner.pdf
oa: '1'
publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner_slides.pdf
status: public
title: Factor Graph Decoding for Speech Presence Probability Estimation
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11812'
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R. Neural Network Based Spectral Mask Estimation
for Acoustic Beamforming. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and
Signal Processing (ICASSP). ; 2016.'
apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming. In Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP).
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming}, booktitle={Proc. IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn
and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming.” In Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP), 2016.
ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Neural Network Based Spectral Mask
Estimation for Acoustic Beamforming,” in Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.
mla: Heymann, Jahn, et al. “Neural Network Based Spectral Mask Estimation for Acoustic
Beamforming.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), 2016.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.'
date_created: 2019-07-12T05:28:44Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_slides.pdf
status: public
title: Neural Network Based Spectral Mask Estimation for Acoustic Beamforming
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11834'
abstract:
- lang: eng
text: We present a system for the 4th CHiME challenge which significantly increases
the performance for all three tracks with respect to the provided baseline system.
The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural
network to estimate signal statistics. These then steer a Generalized Eigenvalue
beamformer. The back-end consists of a 22 layer deep Wide Residual Network and
two extra BLSTM layers. Working on a whole utterance instead of frames allows
us to refine Batch-Normalization. We also train our own BLSTM-based language model.
Adding a discriminative speaker adaptation leads to further gains. The final system
achieves a word error rate on the six channel real test data of 3.48%. For the
two channel track we achieve 5.96% and for the one channel track 9.34%. This is
the best reported performance on the challenge achieved by a single system, i.e.,
a configuration, which does not combine multiple systems. At the same time, our
system is independent of the microphone configuration. We can thus use the same
components for all three tracks.
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative
Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language.
; 2016.'
apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network
with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer
Speech and Language.
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM
Network with Discriminative Speaker Adaptation for Robust Speech Recognition},
booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas
and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM
Network with Discriminative Speaker Adaptation for Robust Speech Recognition.”
In Computer Speech and Language, 2016.
ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with
Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer
Speech and Language, 2016.
mla: Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker
Adaptation for Robust Speech Recognition.” Computer Speech and Language,
2016.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language,
2016.'
date_created: 2019-07-12T05:29:09Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf
oa: '1'
publication: Computer Speech and Language
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf
status: public
title: Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust
Speech Recognition
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11908'
abstract:
- lang: eng
text: 'This paper describes automatic speech recognition (ASR) systems developed
jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME
Challenge. In the 2ch and 6ch tracks the final system output is obtained by a
Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM)
is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM)
units. The systems differ by front ends and training sets used for the acoustic
training. The model for the 1ch track is trained without any preprocessing. For
each front end we trained and evaluated individual acoustic models. We compare
the ASR performance of different beamforming approaches: a conventional superdirective
beamformer [1] and an MVDR beamformer as in [2], where the steering vector is
estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized
Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s
open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices
with a Long Short-Term Memory (LSTM) based language model. The overall best results
are obtained by a system combination that includes the lattices from the system
of UPB?s submission [8]. Our final submission scored second in each of the three
tracks of the 4th CHiME Challenge.'
author:
- first_name: Tobias
full_name: Menne, Tobias
last_name: Menne
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Anastasios
full_name: Alexandridis, Anastasios
last_name: Alexandridis
- first_name: Kazuki
full_name: Irie, Kazuki
last_name: Irie
- first_name: Albert
full_name: Zeyer, Albert
last_name: Zeyer
- first_name: Markus
full_name: Kitza, Markus
last_name: Kitza
- first_name: Pavel
full_name: Golik, Pavel
last_name: Golik
- first_name: Ilia
full_name: Kulikov, Ilia
last_name: Kulikov
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Ralf
full_name: Schlüter, Ralf
last_name: Schlüter
- first_name: Hermann
full_name: Ney, Hermann
last_name: Ney
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Athanasios
full_name: Mouchtaris, Athanasios
last_name: Mouchtaris
citation:
ama: 'Menne T, Heymann J, Alexandridis A, et al. The RWTH/UPB/FORTH System Combination
for the 4th CHiME Challenge Evaluation. In: Computer Speech and Language.
; 2016.'
apa: Menne, T., Heymann, J., Alexandridis, A., Irie, K., Zeyer, A., Kitza, M., …
Mouchtaris, A. (2016). The RWTH/UPB/FORTH System Combination for the 4th CHiME
Challenge Evaluation. In Computer Speech and Language.
bibtex: '@inproceedings{Menne_Heymann_Alexandridis_Irie_Zeyer_Kitza_Golik_Kulikov_Drude_Schlüter_et
al._2016, title={The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge
Evaluation}, booktitle={Computer Speech and Language}, author={Menne, Tobias and
Heymann, Jahn and Alexandridis, Anastasios and Irie, Kazuki and Zeyer, Albert
and Kitza, Markus and Golik, Pavel and Kulikov, Ilia and Drude, Lukas and Schlüter,
Ralf and et al.}, year={2016} }'
chicago: Menne, Tobias, Jahn Heymann, Anastasios Alexandridis, Kazuki Irie, Albert
Zeyer, Markus Kitza, Pavel Golik, et al. “The RWTH/UPB/FORTH System Combination
for the 4th CHiME Challenge Evaluation.” In Computer Speech and Language,
2016.
ieee: T. Menne et al., “The RWTH/UPB/FORTH System Combination for the 4th
CHiME Challenge Evaluation,” in Computer Speech and Language, 2016.
mla: Menne, Tobias, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME
Challenge Evaluation.” Computer Speech and Language, 2016.
short: 'T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik,
I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, A. Mouchtaris, in:
Computer Speech and Language, 2016.'
date_created: 2019-07-12T05:30:35Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_rwthupbforth_paper.pdf
oa: '1'
publication: Computer Speech and Language
status: public
title: The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11755'
abstract:
- lang: eng
text: This contribution presents a Direction of Arrival (DoA) estimation algorithm
based on the complex Watson distribution to incorporate both phase and level differences
of captured micro- phone array signals. The derived algorithm is reviewed in the
context of the Generalized State Coherence Transform (GSCT) on the one hand and
a kernel density estimation method on the other hand. A thorough simulative evaluation
yields insight into parameter selection and provides details on the performance
for both directional and omni-directional microphones. A comparison to the well
known Steered Response Power with Phase Transform (SRP-PHAT) algorithm and a state
of the art DoA estimator which explicitly accounts for aliasing, shows in particular
the advantages of presented algorithm if inter-sensor level differences are indicative
of the DoA, as with directional microphones.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Florian
full_name: Jacob, Florian
last_name: Jacob
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Jacob F, Haeb-Umbach R. DOA-Estimation based on a Complex Watson
Kernel Method. In: 23th European Signal Processing Conference (EUSIPCO 2015).
; 2015.'
apa: Drude, L., Jacob, F., & Haeb-Umbach, R. (2015). DOA-Estimation based on
a Complex Watson Kernel Method. In 23th European Signal Processing Conference
(EUSIPCO 2015).
bibtex: '@inproceedings{Drude_Jacob_Haeb-Umbach_2015, title={DOA-Estimation based
on a Complex Watson Kernel Method}, booktitle={23th European Signal Processing
Conference (EUSIPCO 2015)}, author={Drude, Lukas and Jacob, Florian and Haeb-Umbach,
Reinhold}, year={2015} }'
chicago: Drude, Lukas, Florian Jacob, and Reinhold Haeb-Umbach. “DOA-Estimation
Based on a Complex Watson Kernel Method.” In 23th European Signal Processing
Conference (EUSIPCO 2015), 2015.
ieee: L. Drude, F. Jacob, and R. Haeb-Umbach, “DOA-Estimation based on a Complex
Watson Kernel Method,” in 23th European Signal Processing Conference (EUSIPCO
2015), 2015.
mla: Drude, Lukas, et al. “DOA-Estimation Based on a Complex Watson Kernel Method.”
23th European Signal Processing Conference (EUSIPCO 2015), 2015.
short: 'L. Drude, F. Jacob, R. Haeb-Umbach, in: 23th European Signal Processing
Conference (EUSIPCO 2015), 2015.'
date_created: 2019-07-12T05:27:38Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15.pdf
oa: '1'
publication: 23th European Signal Processing Conference (EUSIPCO 2015)
related_material:
link:
- description: Presentation
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15_Presentation.pdf
status: public
title: DOA-Estimation based on a Complex Watson Kernel Method
type: conference
user_id: '44006'
year: '2015'
...
---
_id: '11810'
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Chinaev A, Haeb-Umbach R. BLSTM supported GEV Beamformer
Front-End for the 3RD CHiME Challenge. In: Automatic Speech Recognition and
Understanding Workshop (ASRU 2015). ; 2015.'
apa: Heymann, J., Drude, L., Chinaev, A., & Haeb-Umbach, R. (2015). BLSTM supported
GEV Beamformer Front-End for the 3RD CHiME Challenge. In Automatic Speech Recognition
and Understanding Workshop (ASRU 2015).
bibtex: '@inproceedings{Heymann_Drude_Chinaev_Haeb-Umbach_2015, title={BLSTM supported
GEV Beamformer Front-End for the 3RD CHiME Challenge}, booktitle={Automatic Speech
Recognition and Understanding Workshop (ASRU 2015)}, author={Heymann, Jahn and
Drude, Lukas and Chinaev, Aleksej and Haeb-Umbach, Reinhold}, year={2015} }'
chicago: Heymann, Jahn, Lukas Drude, Aleksej Chinaev, and Reinhold Haeb-Umbach.
“BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” In Automatic
Speech Recognition and Understanding Workshop (ASRU 2015), 2015.
ieee: J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV
Beamformer Front-End for the 3RD CHiME Challenge,” in Automatic Speech Recognition
and Understanding Workshop (ASRU 2015), 2015.
mla: Heymann, Jahn, et al. “BLSTM Supported GEV Beamformer Front-End for the 3RD
CHiME Challenge.” Automatic Speech Recognition and Understanding Workshop (ASRU
2015), 2015.
short: 'J. Heymann, L. Drude, A. Chinaev, R. Haeb-Umbach, in: Automatic Speech Recognition
and Understanding Workshop (ASRU 2015), 2015.'
date_created: 2019-07-12T05:28:41Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
publication: Automatic Speech Recognition and Understanding Workshop (ASRU 2015)
status: public
title: BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge
type: conference
user_id: '44006'
year: '2015'
...
---
_id: '11919'
abstract:
- lang: eng
text: In this paper we present a source counting algorithm to determine the number
of speakers in a speech mixture. In our proposed method, we model the histogram
of estimated directions of arrival with a nonparametric Bayesian infinite Gaussian
mixture model. As an alternative to classical model selection criteria and to
avoid specifying the maximum number of mixture components in advance, a Dirichlet
process prior is employed over the mixture components. This allows to automatically
determine the optimal number of mixture components that most probably model the
observations. We demonstrate by experiments that this model outperforms a parametric
approach using a finite Gaussian mixture model with a Dirichlet distribution prior
over the mixture weights.
author:
- first_name: Oliver
full_name: Walter, Oliver
last_name: Walter
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Walter O, Drude L, Haeb-Umbach R. Source Counting in Speech Mixtures by Nonparametric
Bayesian Estimation of an infinite Gaussian Mixture Model. In: 40th International
Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). ; 2015.'
apa: Walter, O., Drude, L., & Haeb-Umbach, R. (2015). Source Counting in Speech
Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture
Model. In 40th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2015).
bibtex: '@inproceedings{Walter_Drude_Haeb-Umbach_2015, title={Source Counting in
Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture
Model}, booktitle={40th International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2015)}, author={Walter, Oliver and Drude, Lukas and Haeb-Umbach,
Reinhold}, year={2015} }'
chicago: Walter, Oliver, Lukas Drude, and Reinhold Haeb-Umbach. “Source Counting
in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian
Mixture Model.” In 40th International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2015), 2015.
ieee: O. Walter, L. Drude, and R. Haeb-Umbach, “Source Counting in Speech Mixtures
by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model,” in
40th International Conference on Acoustics, Speech and Signal Processing (ICASSP
2015), 2015.
mla: Walter, Oliver, et al. “Source Counting in Speech Mixtures by Nonparametric
Bayesian Estimation of an Infinite Gaussian Mixture Model.” 40th International
Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.
short: 'O. Walter, L. Drude, R. Haeb-Umbach, in: 40th International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.'
date_created: 2019-07-12T05:30:47Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15.pdf
oa: '1'
publication: 40th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2015)
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15_Poster.pdf
status: public
title: Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of
an infinite Gaussian Mixture Model
type: conference
user_id: '44006'
year: '2015'
...