---
_id: '11965'
abstract:
- lang: eng
text: 'We present an unsupervised training approach for a neural network-based mask
estimator in an acoustic beamforming application. The network is trained to maximize
a likelihood criterion derived from a spatial mixture model of the observations.
It is trained from scratch without requiring any parallel data consisting of degraded
input and clean training targets. Thus, training can be carried out on real recordings
of noisy speech rather than simulated ones. In contrast to previous work on unsupervised
training of neural mask estimators, our approach avoids the need for a possibly
pre-trained teacher model entirely. We demonstrate the effectiveness of our approach
by speech recognition experiments on two different datasets: one mainly deteriorated
by noise (CHiME 4) and one by reverberation (REVERB). The results show that the
performance of the proposed system is on par with a supervised system using oracle
target masks for training and with a system trained using a model-based teacher.'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Heymann J, Haeb-Umbach R. Unsupervised training of neural mask-based
beamforming. In: INTERSPEECH 2019, Graz, Austria. ; 2019.'
apa: Drude, L., Heymann, J., & Haeb-Umbach, R. (2019). Unsupervised training
of neural mask-based beamforming. In INTERSPEECH 2019, Graz, Austria.
bibtex: '@inproceedings{Drude_Heymann_Haeb-Umbach_2019, title={Unsupervised training
of neural mask-based beamforming}, booktitle={INTERSPEECH 2019, Graz, Austria},
author={Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2019}
}'
chicago: Drude, Lukas, Jahn Heymann, and Reinhold Haeb-Umbach. “Unsupervised Training
of Neural Mask-Based Beamforming.” In INTERSPEECH 2019, Graz, Austria,
2019.
ieee: L. Drude, J. Heymann, and R. Haeb-Umbach, “Unsupervised training of neural
mask-based beamforming,” in INTERSPEECH 2019, Graz, Austria, 2019.
mla: Drude, Lukas, et al. “Unsupervised Training of Neural Mask-Based Beamforming.”
INTERSPEECH 2019, Graz, Austria, 2019.
short: 'L. Drude, J. Heymann, R. Haeb-Umbach, in: INTERSPEECH 2019, Graz, Austria,
2019.'
date_created: 2019-07-18T09:11:39Z
date_updated: 2022-01-06T06:51:14Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2019-08-13T06:36:44Z
date_updated: 2019-08-13T06:41:35Z
file_id: '12914'
file_name: INTERSPEECH_2019_Drude_Paper.pdf
file_size: 223413
relation: main_file
file_date_updated: 2019-08-13T06:41:35Z
has_accepted_license: '1'
language:
- iso: eng
license: https://creativecommons.org/publicdomain/zero/1.0/
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2019, Graz, Austria
status: public
title: Unsupervised training of neural mask-based beamforming
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12874'
abstract:
- lang: eng
text: We propose a training scheme to train neural network-based source separation
algorithms from scratch when parallel clean data is unavailable. In particular,
we demonstrate that an unsupervised spatial clustering algorithm is sufficient
to guide the training of a deep clustering system. We argue that previous work
on deep clustering requires strong supervision and elaborate on why this is a
limitation. We demonstrate that (a) the single-channel deep clustering system
trained according to the proposed scheme alone is able to achieve a similar performance
as the multi-channel teacher in terms of word error rates and (b) initializing
the spatial clustering approach with the deep clustering result yields a relative
word error rate reduction of 26% over the unsupervised teacher.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Daniel
full_name: Hasenklever, Daniel
last_name: Hasenklever
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Hasenklever D, Haeb-Umbach R. Unsupervised Training of a Deep Clustering
Model for Multichannel Blind Source Separation. In: ICASSP 2019, Brighton,
UK. ; 2019.'
apa: Drude, L., Hasenklever, D., & Haeb-Umbach, R. (2019). Unsupervised Training
of a Deep Clustering Model for Multichannel Blind Source Separation. In ICASSP
2019, Brighton, UK.
bibtex: '@inproceedings{Drude_Hasenklever_Haeb-Umbach_2019, title={Unsupervised
Training of a Deep Clustering Model for Multichannel Blind Source Separation},
booktitle={ICASSP 2019, Brighton, UK}, author={Drude, Lukas and Hasenklever, Daniel
and Haeb-Umbach, Reinhold}, year={2019} }'
chicago: Drude, Lukas, Daniel Hasenklever, and Reinhold Haeb-Umbach. “Unsupervised
Training of a Deep Clustering Model for Multichannel Blind Source Separation.”
In ICASSP 2019, Brighton, UK, 2019.
ieee: L. Drude, D. Hasenklever, and R. Haeb-Umbach, “Unsupervised Training of a
Deep Clustering Model for Multichannel Blind Source Separation,” in ICASSP
2019, Brighton, UK, 2019.
mla: Drude, Lukas, et al. “Unsupervised Training of a Deep Clustering Model for
Multichannel Blind Source Separation.” ICASSP 2019, Brighton, UK, 2019.
short: 'L. Drude, D. Hasenklever, R. Haeb-Umbach, in: ICASSP 2019, Brighton, UK,
2019.'
date_created: 2019-07-23T07:37:54Z
date_updated: 2022-01-06T06:51:21Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2019-08-14T07:19:13Z
date_updated: 2019-08-14T07:19:13Z
file_id: '12925'
file_name: ICASSP_2019_Drude_Paper.pdf
file_size: 368225
relation: main_file
file_date_updated: 2019-08-14T07:19:13Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Unsupervised Training of a Deep Clustering Model for Multichannel Blind Source
Separation
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12875'
abstract:
- lang: eng
text: Signal dereverberation using the Weighted Prediction Error (WPE) method has
been proven to be an effective means to raise the accuracy of far-field speech
recognition. First proposed as an iterative algorithm, follow-up works have reformulated
it as a recursive least squares algorithm and therefore enabled its use in online
applications. For this algorithm, the estimation of the power spectral density
(PSD) of the anechoic signal plays an important role and strongly influences its
performance. Recently, we showed that using a neural network PSD estimator leads
to improved performance for online automatic speech recognition. This, however,
comes at a price. To train the network, we require parallel data, i.e., utterances
simultaneously available in clean and reverberated form. Here we propose to overcome
this limitation by training the network jointly with the acoustic model of the
speech recognizer. To be specific, the gradients computed from the cross-entropy
loss between the target senone sequence and the acoustic model network output
is backpropagated through the complex-valued dereverberation filter estimation
to the neural network for PSD estimation. Evaluation on two databases demonstrates
improved performance for on-line processing scenarios while imposing fewer requirements
on the available training data and thus widening the range of applications.
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Keisuke
full_name: Kinoshita, Keisuke
last_name: Kinoshita
- first_name: Tomohiro
full_name: Nakatani, Tomohiro
last_name: Nakatani
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Joint Optimization
of Neural Network-based WPE Dereverberation and Acoustic Model for Robust Online
ASR. In: ICASSP 2019, Brighton, UK. ; 2019.'
apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., & Nakatani, T.
(2019). Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
Model for Robust Online ASR. In ICASSP 2019, Brighton, UK.
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2019, title={Joint
Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
Robust Online ASR}, booktitle={ICASSP 2019, Brighton, UK}, author={Heymann, Jahn
and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
Tomohiro}, year={2019} }'
chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
Tomohiro Nakatani. “Joint Optimization of Neural Network-Based WPE Dereverberation
and Acoustic Model for Robust Online ASR.” In ICASSP 2019, Brighton, UK,
2019.
ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Joint
Optimization of Neural Network-based WPE Dereverberation and Acoustic Model for
Robust Online ASR,” in ICASSP 2019, Brighton, UK, 2019.
mla: Heymann, Jahn, et al. “Joint Optimization of Neural Network-Based WPE Dereverberation
and Acoustic Model for Robust Online ASR.” ICASSP 2019, Brighton, UK, 2019.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: ICASSP
2019, Brighton, UK, 2019.'
date_created: 2019-07-23T07:42:26Z
date_updated: 2022-01-06T06:51:22Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2019-12-17T07:28:06Z
date_updated: 2019-12-17T07:28:06Z
file_id: '15334'
file_name: ICASSP_2019_Heymann_Paper.pdf
file_size: 199109
relation: main_file
file_date_updated: 2019-12-17T07:28:06Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ICASSP 2019, Brighton, UK
status: public
title: Joint Optimization of Neural Network-based WPE Dereverberation and Acoustic
Model for Robust Online ASR
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12876'
abstract:
- lang: eng
text: In this paper, we present libDirectional, a MATLAB library for directional
statistics and directional estimation. It supports a variety of commonly used
distributions on the unit circle, such as the von Mises, wrapped normal, and wrapped
Cauchy distributions. Furthermore, various distributions on higher-dimensional
manifolds such as the unit hypersphere and the hypertorus are available. Based
on these distributions, several recursive filtering algorithms in libDirectional
allow estimation on these manifolds. The functionality is implemented in a clear,
well-documented, and object-oriented structure that is both easy to use and easy
to extend.
author:
- first_name: Gerhard
full_name: Kurz, Gerhard
last_name: Kurz
- first_name: Igor
full_name: Gilitschenski, Igor
last_name: Gilitschenski
- first_name: Florian
full_name: Pfaff, Florian
last_name: Pfaff
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Uwe D.
full_name: Hanebeck, Uwe D.
last_name: Hanebeck
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Roland Y.
full_name: Siegwart, Roland Y.
last_name: Siegwart
citation:
ama: 'Kurz G, Gilitschenski I, Pfaff F, et al. Directional Statistics and Filtering
Using libDirectional. In: Journal of Statistical Software 89(4). ; 2019.'
apa: Kurz, G., Gilitschenski, I., Pfaff, F., Drude, L., Hanebeck, U. D., Haeb-Umbach,
R., & Siegwart, R. Y. (2019). Directional Statistics and Filtering Using libDirectional.
In Journal of Statistical Software 89(4).
bibtex: '@inproceedings{Kurz_Gilitschenski_Pfaff_Drude_Hanebeck_Haeb-Umbach_Siegwart_2019,
title={Directional Statistics and Filtering Using libDirectional}, booktitle={Journal
of Statistical Software 89(4)}, author={Kurz, Gerhard and Gilitschenski, Igor
and Pfaff, Florian and Drude, Lukas and Hanebeck, Uwe D. and Haeb-Umbach, Reinhold
and Siegwart, Roland Y.}, year={2019} }'
chicago: Kurz, Gerhard, Igor Gilitschenski, Florian Pfaff, Lukas Drude, Uwe D. Hanebeck,
Reinhold Haeb-Umbach, and Roland Y. Siegwart. “Directional Statistics and Filtering
Using LibDirectional.” In Journal of Statistical Software 89(4), 2019.
ieee: G. Kurz et al., “Directional Statistics and Filtering Using libDirectional,”
in Journal of Statistical Software 89(4), 2019.
mla: Kurz, Gerhard, et al. “Directional Statistics and Filtering Using LibDirectional.”
Journal of Statistical Software 89(4), 2019.
short: 'G. Kurz, I. Gilitschenski, F. Pfaff, L. Drude, U.D. Hanebeck, R. Haeb-Umbach,
R.Y. Siegwart, in: Journal of Statistical Software 89(4), 2019.'
date_created: 2019-07-23T07:44:59Z
date_updated: 2022-01-06T06:51:22Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2019-08-14T07:16:05Z
date_updated: 2019-08-14T07:16:05Z
file_id: '12923'
file_name: JournalofStatisticalSoftware_2019_Drude_Paper.pdf
file_size: 1522964
relation: main_file
file_date_updated: 2019-08-14T07:16:05Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
publication: Journal of Statistical Software 89(4)
status: public
title: Directional Statistics and Filtering Using libDirectional
type: conference
user_id: '59789'
year: '2019'
...
---
_id: '12890'
abstract:
- lang: eng
text: 'We formulate a generic framework for blind source separation (BSS), which
allows integrating data-driven spectro-temporal methods, such as deep clustering
and deep attractor networks, with physically motivated probabilistic spatial methods,
such as complex angular central Gaussian mixture models. The integrated model
exploits the complementary strengths of the two approaches to BSS: the strong
modeling power of neural networks, which, however, is based on supervised learning,
and the ease of unsupervised learning of the spatial mixture models whose few
parameters can be estimated on as little as a single segment of a real mixture
of speech. Experiments are carried out on both artificially mixed speech and true
recordings of speech mixtures. The experiments verify that the integrated models
consistently outperform the individual components. We further extend the models
to cope with noisy, reverberant speech and introduce a cross-domain teacher–student
training where the mixture model serves as the teacher to provide training targets
for the student neural network.'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: Drude L, Haeb-Umbach R. Integration of Neural Networks and Probabilistic Spatial
Models for Acoustic Blind Source Separation. IEEE Journal of Selected Topics
in Signal Processing. 2019. doi:10.1109/JSTSP.2019.2912565
apa: Drude, L., & Haeb-Umbach, R. (2019). Integration of Neural Networks and
Probabilistic Spatial Models for Acoustic Blind Source Separation. IEEE Journal
of Selected Topics in Signal Processing. https://doi.org/10.1109/JSTSP.2019.2912565
bibtex: '@article{Drude_Haeb-Umbach_2019, title={Integration of Neural Networks
and Probabilistic Spatial Models for Acoustic Blind Source Separation}, DOI={10.1109/JSTSP.2019.2912565},
journal={IEEE Journal of Selected Topics in Signal Processing}, author={Drude,
Lukas and Haeb-Umbach, Reinhold}, year={2019} }'
chicago: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks
and Probabilistic Spatial Models for Acoustic Blind Source Separation.” IEEE
Journal of Selected Topics in Signal Processing, 2019. https://doi.org/10.1109/JSTSP.2019.2912565.
ieee: L. Drude and R. Haeb-Umbach, “Integration of Neural Networks and Probabilistic
Spatial Models for Acoustic Blind Source Separation,” IEEE Journal of Selected
Topics in Signal Processing, 2019.
mla: Drude, Lukas, and Reinhold Haeb-Umbach. “Integration of Neural Networks and
Probabilistic Spatial Models for Acoustic Blind Source Separation.” IEEE Journal
of Selected Topics in Signal Processing, 2019, doi:10.1109/JSTSP.2019.2912565.
short: L. Drude, R. Haeb-Umbach, IEEE Journal of Selected Topics in Signal Processing
(2019).
date_created: 2019-07-26T08:38:46Z
date_updated: 2022-01-06T06:51:23Z
ddc:
- '050'
department:
- _id: '54'
doi: 10.1109/JSTSP.2019.2912565
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2019-08-07T07:12:21Z
date_updated: 2019-08-14T07:11:22Z
file_id: '12903'
file_name: IEEE Jounal_2019_Drude_Paper.pdf
file_size: 967424
relation: main_file
file_date_updated: 2019-08-14T07:11:22Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: IEEE Journal of Selected Topics in Signal Processing
publication_identifier:
eissn:
- 1941-0484
status: public
title: Integration of Neural Networks and Probabilistic Spatial Models for Acoustic
Blind Source Separation
type: journal_article
user_id: '11213'
year: '2019'
...
---
_id: '15796'
abstract:
- lang: eng
text: In this paper we consider human daily activity recognition using an acoustic
sensor network (ASN) which consists of nodes distributed in a home environment.
Assuming that the ASN is permanently recording, the vast majority of recordings
is silence. Therefore, we propose to employ a computationally efficient two-stage
sound recognition system, consisting of an initial sound activity detection (SAD)
and a subsequent sound event classification (SEC), which is only activated once
sound activity has been detected. We show how a low-latency activity detector
with high temporal resolution can be trained from weak labels with low temporal
resolution. We further demonstrate the advantage of using spatial features for
the subsequent event classification task.
author:
- first_name: Janek
full_name: Ebbers, Janek
id: '34851'
last_name: Ebbers
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Andreas
full_name: Brendel, Andreas
last_name: Brendel
- first_name: Walter
full_name: Kellermann, Walter
last_name: Kellermann
citation:
ama: 'Ebbers J, Drude L, Haeb-Umbach R, Brendel A, Kellermann W. Weakly Supervised
Sound Activity Detection and Event Classification in Acoustic Sensor Networks.
In: CAMSAP 2019, Guadeloupe, West Indies. ; 2019.'
apa: Ebbers, J., Drude, L., Haeb-Umbach, R., Brendel, A., & Kellermann, W. (2019).
Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
Sensor Networks. CAMSAP 2019, Guadeloupe, West Indies.
bibtex: '@inproceedings{Ebbers_Drude_Haeb-Umbach_Brendel_Kellermann_2019, title={Weakly
Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
Networks}, booktitle={CAMSAP 2019, Guadeloupe, West Indies}, author={Ebbers, Janek
and Drude, Lukas and Haeb-Umbach, Reinhold and Brendel, Andreas and Kellermann,
Walter}, year={2019} }'
chicago: Ebbers, Janek, Lukas Drude, Reinhold Haeb-Umbach, Andreas Brendel, and
Walter Kellermann. “Weakly Supervised Sound Activity Detection and Event Classification
in Acoustic Sensor Networks.” In CAMSAP 2019, Guadeloupe, West Indies,
2019.
ieee: J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, and W. Kellermann, “Weakly
Supervised Sound Activity Detection and Event Classification in Acoustic Sensor
Networks,” 2019.
mla: Ebbers, Janek, et al. “Weakly Supervised Sound Activity Detection and Event
Classification in Acoustic Sensor Networks.” CAMSAP 2019, Guadeloupe, West
Indies, 2019.
short: 'J. Ebbers, L. Drude, R. Haeb-Umbach, A. Brendel, W. Kellermann, in: CAMSAP
2019, Guadeloupe, West Indies, 2019.'
date_created: 2020-02-05T10:20:17Z
date_updated: 2023-11-22T08:29:58Z
ddc:
- '000'
department:
- _id: '54'
file:
- access_level: open_access
content_type: application/pdf
creator: huesera
date_created: 2020-02-05T10:21:39Z
date_updated: 2020-02-05T10:21:39Z
file_id: '15797'
file_name: CAMSAP_2019_WS_Ebbers_Paper.pdf
file_size: 311887
relation: main_file
file_date_updated: 2020-02-05T10:21:39Z
has_accepted_license: '1'
language:
- iso: eng
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: CAMSAP 2019, Guadeloupe, West Indies
quality_controlled: '1'
status: public
title: Weakly Supervised Sound Activity Detection and Event Classification in Acoustic
Sensor Networks
type: conference
user_id: '34851'
year: '2019'
...
---
_id: '11835'
abstract:
- lang: eng
text: Signal dereverberation using the weighted prediction error (WPE) method has
been proven to be an effective means to raise the accuracy of far-field speech
recognition. But in its original formulation, WPE requires multiple iterations
over a sufficiently long utterance, rendering it unsuitable for online low-latency
applications. Recently, two methods have been proposed to overcome this limitation.
One utilizes a neural network to estimate the power spectral density (PSD) of
the target signal and works in a block-online fashion. The other method relies
on a rather simple PSD estimation which smoothes the observed PSD and utilizes
a recursive formulation which enables it to work on a frame-by-frame basis. In
this paper, we integrate a deep neural network (DNN) based estimator into the
recursive frame-online formulation. We evaluate the performance of the recursive
system with different PSD estimators in comparison to the block-online and offline
variant on two distinct corpora. The REVERB challenge data, where the signal is
mainly deteriorated by reverberation, and a database which combines WSJ and VoiceHome
to also consider (directed) noise sources. The results show that although smoothing
works surprisingly well, the more sophisticated DNN based estimator shows promising
improvements and shortens the performance gap between online and offline processing.
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Keisuke
full_name: Kinoshita, Keisuke
last_name: Kinoshita
- first_name: Tomohiro
full_name: Nakatani, Tomohiro
last_name: Nakatani
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R, Kinoshita K, Nakatani T. Frame-Online DNN-WPE
Dereverberation. In: IWAENC 2018, Tokio, Japan. ; 2018.'
apa: Heymann, J., Drude, L., Haeb-Umbach, R., Kinoshita, K., & Nakatani, T.
(2018). Frame-Online DNN-WPE Dereverberation. In IWAENC 2018, Tokio, Japan.
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_Kinoshita_Nakatani_2018, title={Frame-Online
DNN-WPE Dereverberation}, booktitle={IWAENC 2018, Tokio, Japan}, author={Heymann,
Jahn and Drude, Lukas and Haeb-Umbach, Reinhold and Kinoshita, Keisuke and Nakatani,
Tomohiro}, year={2018} }'
chicago: Heymann, Jahn, Lukas Drude, Reinhold Haeb-Umbach, Keisuke Kinoshita, and
Tomohiro Nakatani. “Frame-Online DNN-WPE Dereverberation.” In IWAENC 2018,
Tokio, Japan, 2018.
ieee: J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, and T. Nakatani, “Frame-Online
DNN-WPE Dereverberation,” in IWAENC 2018, Tokio, Japan, 2018.
mla: Heymann, Jahn, et al. “Frame-Online DNN-WPE Dereverberation.” IWAENC 2018,
Tokio, Japan, 2018.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, K. Kinoshita, T. Nakatani, in: IWAENC
2018, Tokio, Japan, 2018.'
date_created: 2019-07-12T05:29:10Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Paper.pdf
oa: '1'
publication: IWAENC 2018, Tokio, Japan
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/IWAENC_2018_Heymann_Poster.pdf
status: public
title: Frame-Online DNN-WPE Dereverberation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '11872'
abstract:
- lang: eng
text: 'The weighted prediction error (WPE) algorithm has proven to be a very successful
dereverberation method for the REVERB challenge. Likewise, neural network based
mask estimation for beamforming demonstrated very good noise suppression in the
CHiME 3 and CHiME 4 challenges. Recently, it has been shown that this estimator
can also be trained to perform dereverberation and denoising jointly. However,
up to now a comparison of a neural beamformer and WPE is still missing, so is
an investigation into a combination of the two. Therefore, we here provide an
extensive evaluation of both and consequently propose variants to integrate deep
neural network based beamforming with WPE. For these integrated variants we identify
a consistent word error rate (WER) reduction on two distinct databases. In particular,
our study shows that deep learning based beamforming benefits from a model-based
dereverberation technique (i.e. WPE) and vice versa. Our key findings are: (a)
Neural beamforming yields the lower WERs in comparison to WPE the more channels
and noise are present. (b) Integration of WPE and a neural beamformer consistently
outperforms all stand-alone systems.'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Keisuke
full_name: Kinoshita, Keisuke
last_name: Kinoshita
- first_name: Marc
full_name: Delcroix, Marc
last_name: Delcroix
- first_name: Tomohiro
full_name: Nakatani, Tomohiro
last_name: Nakatani
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Boeddeker C, Heymann J, et al. Integration neural network based beamforming
and weighted prediction error dereverberation. In: INTERSPEECH 2018, Hyderabad,
India. ; 2018.'
apa: Drude, L., Boeddeker, C., Heymann, J., Kinoshita, K., Delcroix, M., Nakatani,
T., & Haeb-Umbach, R. (2018). Integration neural network based beamforming
and weighted prediction error dereverberation. In INTERSPEECH 2018, Hyderabad,
India.
bibtex: '@inproceedings{Drude_Boeddeker_Heymann_Kinoshita_Delcroix_Nakatani_Haeb-Umbach_2018,
title={Integration neural network based beamforming and weighted prediction error
dereverberation}, booktitle={INTERSPEECH 2018, Hyderabad, India}, author={Drude,
Lukas and Boeddeker, Christoph and Heymann, Jahn and Kinoshita, Keisuke and Delcroix,
Marc and Nakatani, Tomohiro and Haeb-Umbach, Reinhold}, year={2018} }'
chicago: Drude, Lukas, Christoph Boeddeker, Jahn Heymann, Keisuke Kinoshita, Marc
Delcroix, Tomohiro Nakatani, and Reinhold Haeb-Umbach. “Integration Neural Network
Based Beamforming and Weighted Prediction Error Dereverberation.” In INTERSPEECH
2018, Hyderabad, India, 2018.
ieee: L. Drude et al., “Integration neural network based beamforming and
weighted prediction error dereverberation,” in INTERSPEECH 2018, Hyderabad,
India, 2018.
mla: Drude, Lukas, et al. “Integration Neural Network Based Beamforming and Weighted
Prediction Error Dereverberation.” INTERSPEECH 2018, Hyderabad, India,
2018.
short: 'L. Drude, C. Boeddeker, J. Heymann, K. Kinoshita, M. Delcroix, T. Nakatani,
R. Haeb-Umbach, in: INTERSPEECH 2018, Hyderabad, India, 2018.'
date_created: 2019-07-12T05:29:53Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: INTERSPEECH 2018, Hyderabad, India
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Drude_Slides.pdf
status: public
title: Integration neural network based beamforming and weighted prediction error
dereverberation
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '11873'
abstract:
- lang: eng
text: NARA-WPE is a Python software package providing implementations of the weighted
prediction error (WPE) dereverberation algorithm. WPE has been shown to be a highly
effective tool for speech dereverberation, thus improving the perceptual quality
of the signal and improving the recognition performance of downstream automatic
speech recognition (ASR). It is suitable both for single-channel and multi-channel
applications. The package consist of (1) a Numpy implementation which can easily
be integrated into a custom Python toolchain, and (2) a TensorFlow implementation
which allows integration into larger computational graphs and enables backpropagation
through WPE to train more advanced front-ends. This package comprises of an iterative
offline (batch) version, a block-online version, and a frame-online version which
can be used in moderately low latency applications, e.g. digital speech assistants.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Heymann J, Boeddeker C, Haeb-Umbach R. NARA-WPE: A Python package
for weighted prediction error dereverberation in Numpy and Tensorflow for online
and offline processing. In: ITG 2018, Oldenburg, Germany. ; 2018.'
apa: 'Drude, L., Heymann, J., Boeddeker, C., & Haeb-Umbach, R. (2018). NARA-WPE:
A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
for online and offline processing. In ITG 2018, Oldenburg, Germany.'
bibtex: '@inproceedings{Drude_Heymann_Boeddeker_Haeb-Umbach_2018, title={NARA-WPE:
A Python package for weighted prediction error dereverberation in Numpy and Tensorflow
for online and offline processing}, booktitle={ITG 2018, Oldenburg, Germany},
author={Drude, Lukas and Heymann, Jahn and Boeddeker, Christoph and Haeb-Umbach,
Reinhold}, year={2018} }'
chicago: 'Drude, Lukas, Jahn Heymann, Christoph Boeddeker, and Reinhold Haeb-Umbach.
“NARA-WPE: A Python Package for Weighted Prediction Error Dereverberation in Numpy
and Tensorflow for Online and Offline Processing.” In ITG 2018, Oldenburg,
Germany, 2018.'
ieee: 'L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach, “NARA-WPE: A Python
package for weighted prediction error dereverberation in Numpy and Tensorflow
for online and offline processing,” in ITG 2018, Oldenburg, Germany, 2018.'
mla: 'Drude, Lukas, et al. “NARA-WPE: A Python Package for Weighted Prediction Error
Dereverberation in Numpy and Tensorflow for Online and Offline Processing.” ITG
2018, Oldenburg, Germany, 2018.'
short: 'L. Drude, J. Heymann, C. Boeddeker, R. Haeb-Umbach, in: ITG 2018, Oldenburg,
Germany, 2018.'
date_created: 2019-07-12T05:29:54Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Paper.pdf
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: ITG 2018, Oldenburg, Germany
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/ITG_2018_Drude_Poster.pdf
status: public
title: 'NARA-WPE: A Python package for weighted prediction error dereverberation in
Numpy and Tensorflow for online and offline processing'
type: conference
user_id: '40767'
year: '2018'
...
---
_id: '12898'
abstract:
- lang: eng
text: Deep clustering (DC) and deep attractor networks (DANs) are a data-driven
way to monaural blind source separation. Both approaches provide astonishing single
channel performance but have not yet been generalized to block-online processing.
When separating speech in a continuous stream with a block-online algorithm, it
needs to be determined in each block which of the output streams belongs to whom.
In this contribution we solve this block permutation problem by introducing an
additional speaker identification embedding to the DAN model structure. We motivate
this model decision by analyzing the embedding topology of DC and DANs and show,
that DC and DANs themselves are not sufficient for speaker identification. This
model structure (a) improves the signal to distortion ratio (SDR) over a DAN baseline
and (b) provides up to 61% and up to 34% relative reduction in permutation error
rate and re-identification error rate compared to an i-vector baseline, respectively.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Thilo
full_name: von Neumann, Thilo
last_name: von Neumann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, von Neumann T, Haeb-Umbach R. Deep Attractor Networks for Speaker
Re-Identifikation and Blind Source Separation. In: ICASSP 2018, Calgary, Canada.
; 2018.'
apa: Drude, L., von Neumann, T., & Haeb-Umbach, R. (2018). Deep Attractor Networks
for Speaker Re-Identifikation and Blind Source Separation. In ICASSP 2018,
Calgary, Canada.
bibtex: '@inproceedings{Drude_von Neumann_Haeb-Umbach_2018, title={Deep Attractor
Networks for Speaker Re-Identifikation and Blind Source Separation}, booktitle={ICASSP
2018, Calgary, Canada}, author={Drude, Lukas and von Neumann, Thilo and Haeb-Umbach,
Reinhold}, year={2018} }'
chicago: Drude, Lukas, Thilo von Neumann, and Reinhold Haeb-Umbach. “Deep Attractor
Networks for Speaker Re-Identifikation and Blind Source Separation.” In ICASSP
2018, Calgary, Canada, 2018.
ieee: L. Drude, T. von Neumann, and R. Haeb-Umbach, “Deep Attractor Networks for
Speaker Re-Identifikation and Blind Source Separation,” in ICASSP 2018, Calgary,
Canada, 2018.
mla: Drude, Lukas, et al. “Deep Attractor Networks for Speaker Re-Identifikation
and Blind Source Separation.” ICASSP 2018, Calgary, Canada, 2018.
short: 'L. Drude, T. von Neumann, R. Haeb-Umbach, in: ICASSP 2018, Calgary, Canada,
2018.'
date_created: 2019-07-30T14:22:53Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude2_Slides.pdf
status: public
title: Deep Attractor Networks for Speaker Re-Identifikation and Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '12900'
abstract:
- lang: eng
text: 'Deep attractor networks (DANs) are a recently introduced method to blindly
separate sources from spectral features of a monaural recording using bidirectional
long short-term memory networks (BLSTMs). Due to the nature of BLSTMs, this is
inherently not online-ready and resorting to operating on blocks yields a block
permutation problem in that the index of each speaker may change between blocks.
We here propose the joint modeling of spatial and spectral features to solve the
block permutation problem and generalize DANs to multi-channel meeting recordings:
The DAN acts as a spectral feature extractor for a subsequent model-based clustering
approach. We first analyze different joint models in batch-processing scenarios
and finally propose a block-online blind source separation algorithm. The efficacy
of the proposed models is demonstrated on reverberant mixtures corrupted by real
recordings of multi-channel background noise. We demonstrate that both the proposed
batch-processing and the proposed block-online system outperform (a) a spatial-only
model with a state-of-the-art frequency permutation solver and (b) a spectral-only
model with an oracle block permutation solver in terms of signal to distortion
ratio (SDR) gains.'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: ' Takuya '
full_name: 'Higuchi,, Takuya '
last_name: Higuchi,
- first_name: 'Keisuke '
full_name: 'Kinoshita, Keisuke '
last_name: Kinoshita
- first_name: 'Tomohiro '
full_name: 'Nakatani, Tomohiro '
last_name: Nakatani
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Higuchi, Takuya , Kinoshita K, Nakatani T, Haeb-Umbach R. Dual Frequency-
and Block-Permutation Alignment for Deep Learning Based Block-Online Blind Source
Separation. In: ICASSP 2018, Calgary, Canada. ; 2018.'
apa: Drude, L., Higuchi, Takuya , Kinoshita, K., Nakatani, T., & Haeb-Umbach,
R. (2018). Dual Frequency- and Block-Permutation Alignment for Deep Learning Based
Block-Online Blind Source Separation. In ICASSP 2018, Calgary, Canada.
bibtex: '@inproceedings{Drude_Higuchi,_Kinoshita_Nakatani_Haeb-Umbach_2018, title={Dual
Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
Blind Source Separation}, booktitle={ICASSP 2018, Calgary, Canada}, author={Drude,
Lukas and Higuchi, Takuya and Kinoshita, Keisuke and Nakatani, Tomohiro and
Haeb-Umbach, Reinhold}, year={2018} }'
chicago: Drude, Lukas, Takuya Higuchi, Keisuke Kinoshita, Tomohiro Nakatani,
and Reinhold Haeb-Umbach. “Dual Frequency- and Block-Permutation Alignment for
Deep Learning Based Block-Online Blind Source Separation.” In ICASSP 2018,
Calgary, Canada, 2018.
ieee: L. Drude, Takuya Higuchi, K. Kinoshita, T. Nakatani, and R. Haeb-Umbach,
“Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
Blind Source Separation,” in ICASSP 2018, Calgary, Canada, 2018.
mla: Drude, Lukas, et al. “Dual Frequency- and Block-Permutation Alignment for Deep
Learning Based Block-Online Blind Source Separation.” ICASSP 2018, Calgary,
Canada, 2018.
short: 'L. Drude, Takuya Higuchi, K. Kinoshita, T. Nakatani, R. Haeb-Umbach, in:
ICASSP 2018, Calgary, Canada, 2018.'
date_created: 2019-07-30T14:42:15Z
date_updated: 2022-01-06T06:51:24Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Paper.pdf
oa: '1'
publication: ICASSP 2018, Calgary, Canada
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/ICASSP_2018_Drude_Poster.pdf
status: public
title: Dual Frequency- and Block-Permutation Alignment for Deep Learning Based Block-Online
Blind Source Separation
type: conference
user_id: '44006'
year: '2018'
...
---
_id: '12899'
abstract:
- lang: eng
text: This contribution presents a speech enhancement system for the CHiME-5 Dinner
Party Scenario. The front-end employs multi-channel linear time-variant filtering
and achieves its gains without the use of a neural network. We present an adaptation
of blind source separation techniques to the CHiME-5 database which we call Guided
Source Separation (GSS). Using the baseline acoustic and language model, the combination
of Weighted Prediction Error based dereverberation, guided source separation,
and beamforming reduces the WER by 10:54% (relative) for the single array track
and by 21:12% (relative) on the multiple array track.
author:
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Jens
full_name: Heitkaemper, Jens
id: '27643'
last_name: Heitkaemper
- first_name: Joerg
full_name: Schmalenstroeer, Joerg
id: '460'
last_name: Schmalenstroeer
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
last_name: Heymann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Boeddeker C, Heitkaemper J, Schmalenstroeer J, Drude L, Heymann J, Haeb-Umbach
R. Front-End Processing for the CHiME-5 Dinner Party Scenario. In: Proc. CHiME
2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India.
; 2018.'
apa: Boeddeker, C., Heitkaemper, J., Schmalenstroeer, J., Drude, L., Heymann, J.,
& Haeb-Umbach, R. (2018). Front-End Processing for the CHiME-5 Dinner Party
Scenario. Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
Hyderabad, India.
bibtex: '@inproceedings{Boeddeker_Heitkaemper_Schmalenstroeer_Drude_Heymann_Haeb-Umbach_2018,
title={Front-End Processing for the CHiME-5 Dinner Party Scenario}, booktitle={Proc.
CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
India}, author={Boeddeker, Christoph and Heitkaemper, Jens and Schmalenstroeer,
Joerg and Drude, Lukas and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2018}
}'
chicago: Boeddeker, Christoph, Jens Heitkaemper, Joerg Schmalenstroeer, Lukas Drude,
Jahn Heymann, and Reinhold Haeb-Umbach. “Front-End Processing for the CHiME-5
Dinner Party Scenario.” In Proc. CHiME 2018 Workshop on Speech Processing in
Everyday Environments, Hyderabad, India, 2018.
ieee: C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann, and
R. Haeb-Umbach, “Front-End Processing for the CHiME-5 Dinner Party Scenario,”
2018.
mla: Boeddeker, Christoph, et al. “Front-End Processing for the CHiME-5 Dinner Party
Scenario.” Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
Hyderabad, India, 2018.
short: 'C. Boeddeker, J. Heitkaemper, J. Schmalenstroeer, L. Drude, J. Heymann,
R. Haeb-Umbach, in: Proc. CHiME 2018 Workshop on Speech Processing in Everyday
Environments, Hyderabad, India, 2018.'
date_created: 2019-07-30T14:35:15Z
date_updated: 2023-10-26T08:14:15Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Paper.pdf
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
Hyderabad, India
quality_controlled: '1'
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_Poster.pdf
status: public
title: Front-End Processing for the CHiME-5 Dinner Party Scenario
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11876'
abstract:
- lang: eng
text: This paper describes the systems for the single-array track and the multiple-array
track of the 5th CHiME Challenge. The final system is a combination of multiple
systems, using Confusion Network Combination (CNC). The different systems presented
here are utilizing different front-ends and training sets for a Bidirectional
Long Short-Term Memory (BLSTM) Acoustic Model (AM). The front-end was replaced
by enhancements provided by Paderborn University [1]. The back-end has been implemented
using RASR [2] and RETURNN [3]. Additionally, a system combination including the
hypothesis word graphs from the system of the submission [1] has been performed,
which results in the final best system.
author:
- first_name: Markus
full_name: Kitza, Markus
last_name: Kitza
- first_name: Wilfried
full_name: Michel, Wilfried
last_name: Michel
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Jens
full_name: Heitkaemper, Jens
id: '27643'
last_name: Heitkaemper
- first_name: Tobias
full_name: Menne, Tobias
last_name: Menne
- first_name: Ralf
full_name: Schlüter, Ralf
last_name: Schlüter
- first_name: Hermann
full_name: Ney, Hermann
last_name: Ney
- first_name: Joerg
full_name: Schmalenstroeer, Joerg
id: '460'
last_name: Schmalenstroeer
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Kitza M, Michel W, Boeddeker C, et al. The RWTH/UPB System Combination for
the CHiME 2018 Workshop. In: Proc. CHiME 2018 Workshop on Speech Processing
in Everyday Environments, Hyderabad, India. ; 2018.'
apa: Kitza, M., Michel, W., Boeddeker, C., Heitkaemper, J., Menne, T., Schlüter,
R., Ney, H., Schmalenstroeer, J., Drude, L., Heymann, J., & Haeb-Umbach, R.
(2018). The RWTH/UPB System Combination for the CHiME 2018 Workshop. Proc.
CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
India.
bibtex: '@inproceedings{Kitza_Michel_Boeddeker_Heitkaemper_Menne_Schlüter_Ney_Schmalenstroeer_Drude_Heymann_et
al._2018, title={The RWTH/UPB System Combination for the CHiME 2018 Workshop},
booktitle={Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
Hyderabad, India}, author={Kitza, Markus and Michel, Wilfried and Boeddeker, Christoph
and Heitkaemper, Jens and Menne, Tobias and Schlüter, Ralf and Ney, Hermann and
Schmalenstroeer, Joerg and Drude, Lukas and Heymann, Jahn and et al.}, year={2018}
}'
chicago: Kitza, Markus, Wilfried Michel, Christoph Boeddeker, Jens Heitkaemper,
Tobias Menne, Ralf Schlüter, Hermann Ney, et al. “The RWTH/UPB System Combination
for the CHiME 2018 Workshop.” In Proc. CHiME 2018 Workshop on Speech Processing
in Everyday Environments, Hyderabad, India, 2018.
ieee: M. Kitza et al., “The RWTH/UPB System Combination for the CHiME 2018
Workshop,” 2018.
mla: Kitza, Markus, et al. “The RWTH/UPB System Combination for the CHiME 2018 Workshop.”
Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments, Hyderabad,
India, 2018.
short: 'M. Kitza, W. Michel, C. Boeddeker, J. Heitkaemper, T. Menne, R. Schlüter,
H. Ney, J. Schmalenstroeer, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc. CHiME
2018 Workshop on Speech Processing in Everyday Environments, Hyderabad, India,
2018.'
date_created: 2019-07-12T05:29:58Z
date_updated: 2023-10-26T08:12:14Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2018/INTERSPEECH_2018_Heitkaemper_RWTH_Paper.pdf
oa: '1'
publication: Proc. CHiME 2018 Workshop on Speech Processing in Everyday Environments,
Hyderabad, India
quality_controlled: '1'
status: public
title: The RWTH/UPB System Combination for the CHiME 2018 Workshop
type: conference
user_id: '460'
year: '2018'
...
---
_id: '11735'
abstract:
- lang: eng
text: This report describes the computation of gradients by algorithmic differentiation
for statistically optimum beamforming operations. Especially the derivation of
complex-valued functions is a key component of this approach. Therefore the real-valued
algorithmic differentiation is extended via the complex-valued chain rule. In
addition to the basic mathematic operations the derivative of the eigenvalue problem
with complex-valued eigenvectors is one of the key results of this report. The
potential of this approach is shown with experimental results on the CHiME-3 challenge
database. There, the beamforming task is used as a front-end for an ASR system.
With the developed derivatives a joint optimization of a speech enhancement and
speech recognition system w.r.t. the recognition optimization criterion is possible.
author:
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Patrick
full_name: Hanebrink, Patrick
last_name: Hanebrink
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. On the Computation
of Complex-Valued Gradients with Application to Statistically Optimum Beamforming.;
2017.
apa: Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., & Haeb-Umbach, R.
(2017). On the Computation of Complex-valued Gradients with Application to
Statistically Optimum Beamforming.
bibtex: '@book{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={On the
Computation of Complex-valued Gradients with Application to Statistically Optimum
Beamforming}, author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas
and Heymann, Jahn and Haeb-Umbach, Reinhold}, year={2017} }'
chicago: Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and
Reinhold Haeb-Umbach. On the Computation of Complex-Valued Gradients with Application
to Statistically Optimum Beamforming, 2017.
ieee: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, On
the Computation of Complex-valued Gradients with Application to Statistically
Optimum Beamforming. 2017.
mla: Boeddeker, Christoph, et al. On the Computation of Complex-Valued Gradients
with Application to Statistically Optimum Beamforming. 2017.
short: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, On the
Computation of Complex-Valued Gradients with Application to Statistically Optimum
Beamforming, 2017.
date_created: 2019-07-12T05:27:15Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/ArXiv_2017_BoeddekerHanebrinkHaeb_Article.pdf
oa: '1'
status: public
title: On the Computation of Complex-valued Gradients with Application to Statistically
Optimum Beamforming
type: report
user_id: '40767'
year: '2017'
...
---
_id: '11736'
abstract:
- lang: eng
text: In this paper we show how a neural network for spectral mask estimation for
an acoustic beamformer can be optimized by algorithmic differentiation. Using
the beamformer output SNR as the objective function to maximize, the gradient
is propagated through the beamformer all the way to the neural network which provides
the clean speech and noise masks from which the beamformer coefficients are estimated
by eigenvalue decomposition. A key theoretical result is the derivative of an
eigenvalue problem involving complex-valued eigenvectors. Experimental results
on the CHiME-3 challenge database demonstrate the effectiveness of the approach.
The tools developed in this paper are a key component for an end-to-end optimization
of speech enhancement and speech recognition.
author:
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Patrick
full_name: Hanebrink, Patrick
last_name: Hanebrink
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Boeddeker C, Hanebrink P, Drude L, Heymann J, Haeb-Umbach R. Optimizing Neural-Network
Supported Acoustic Beamforming by Algorithmic Differentiation. In: Proc. IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2017.'
apa: Boeddeker, C., Hanebrink, P., Drude, L., Heymann, J., & Haeb-Umbach, R.
(2017). Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic
Differentiation. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal
Processing (ICASSP).
bibtex: '@inproceedings{Boeddeker_Hanebrink_Drude_Heymann_Haeb-Umbach_2017, title={Optimizing
Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation},
booktitle={Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)},
author={Boeddeker, Christoph and Hanebrink, Patrick and Drude, Lukas and Heymann,
Jahn and Haeb-Umbach, Reinhold}, year={2017} }'
chicago: Boeddeker, Christoph, Patrick Hanebrink, Lukas Drude, Jahn Heymann, and
Reinhold Haeb-Umbach. “Optimizing Neural-Network Supported Acoustic Beamforming
by Algorithmic Differentiation.” In Proc. IEEE Intl. Conf. on Acoustics, Speech
and Signal Processing (ICASSP), 2017.
ieee: C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, and R. Haeb-Umbach, “Optimizing
Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation,”
in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP),
2017.
mla: Boeddeker, Christoph, et al. “Optimizing Neural-Network Supported Acoustic
Beamforming by Algorithmic Differentiation.” Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2017.
short: 'C. Boeddeker, P. Hanebrink, L. Drude, J. Heymann, R. Haeb-Umbach, in: Proc.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.'
date_created: 2019-07-12T05:27:16Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_boeddeker_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
status: public
title: Optimizing Neural-Network Supported Acoustic Beamforming by Algorithmic Differentiation
type: conference
user_id: '44006'
year: '2017'
...
---
_id: '11754'
abstract:
- lang: eng
text: Recent advances in discriminatively trained mask estimation networks to extract
a single source utilizing beamforming techniques demonstrate, that the integration
of statistical models and deep neural networks (DNNs) are a promising approach
for robust automatic speech recognition (ASR) applications. In this contribution
we demonstrate how discriminatively trained embeddings on spectral features can
be tightly integrated into statistical model-based source separation to separate
and transcribe overlapping speech. Good generalization to unseen spatial configurations
is achieved by estimating a statistical model at test time, while still leveraging
discriminative training of deep clustering embeddings on a separate training set.
We formulate an expectation maximization (EM) algorithm which jointly estimates
a model for deep clustering embeddings and complex-valued spatial observations
in the short time Fourier transform (STFT) domain at test time. Extensive simulations
confirm, that the integrated model outperforms (a) a deep clustering model with
a subsequent beamforming step and (b) an EM-based model with a beamforming step
alone in terms of signal to distortion ratio (SDR) and perceptually motivated
metric (PESQ) gains. ASR results on a reverberated dataset further show, that
the aforementioned gains translate to reduced word error rates (WERs) even in
reverberant environments.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Haeb-Umbach R. Tight integration of spatial and spectral features
for BSS with Deep Clustering embeddings. In: INTERSPEECH 2017, Stockholm, Schweden.
; 2017.'
apa: Drude, L., & Haeb-Umbach, R. (2017). Tight integration of spatial and spectral
features for BSS with Deep Clustering embeddings. In INTERSPEECH 2017, Stockholm,
Schweden.
bibtex: '@inproceedings{Drude_Haeb-Umbach_2017, title={Tight integration of spatial
and spectral features for BSS with Deep Clustering embeddings}, booktitle={INTERSPEECH
2017, Stockholm, Schweden}, author={Drude, Lukas and Haeb-Umbach, Reinhold}, year={2017}
}'
chicago: Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and
Spectral Features for BSS with Deep Clustering Embeddings.” In INTERSPEECH
2017, Stockholm, Schweden, 2017.
ieee: L. Drude and R. Haeb-Umbach, “Tight integration of spatial and spectral features
for BSS with Deep Clustering embeddings,” in INTERSPEECH 2017, Stockholm, Schweden,
2017.
mla: Drude, Lukas, and Reinhold Haeb-Umbach. “Tight Integration of Spatial and Spectral
Features for BSS with Deep Clustering Embeddings.” INTERSPEECH 2017, Stockholm,
Schweden, 2017.
short: 'L. Drude, R. Haeb-Umbach, in: INTERSPEECH 2017, Stockholm, Schweden, 2017.'
date_created: 2019-07-12T05:27:37Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_paper.pdf
oa: '1'
publication: INTERSPEECH 2017, Stockholm, Schweden
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Drude_slides.pdf
status: public
title: Tight integration of spatial and spectral features for BSS with Deep Clustering
embeddings
type: conference
user_id: '44006'
year: '2017'
...
---
_id: '11809'
abstract:
- lang: eng
text: This paper presents an end-to-end training approach for a beamformer-supported
multi-channel ASR system. A neural network which estimates masks for a statistically
optimum beamformer is jointly trained with a network for acoustic modeling. To
update its parameters, we propagate the gradients from the acoustic model all
the way through feature extraction and the complex valued beamforming operation.
Besides avoiding a mismatch between the front-end and the back-end, this approach
also eliminates the need for stereo data, i.e., the parallel availability of clean
and noisy versions of the signals. Instead, it can be trained with real noisy
multichannel data only. Also, relying on the signal statistics for beamforming,
the approach makes no assumptions on the configuration of the microphone array.
We further observe a performance gain through joint training in terms of word
error rate in an evaluation of the system on the CHiME 4 dataset.
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Patrick
full_name: Hanebrink, Patrick
last_name: Hanebrink
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Boeddeker C, Hanebrink P, Haeb-Umbach R. BEAMNET: End-to-End
Training of a Beamformer-Supported Multi-Channel ASR System. In: Proc. IEEE
Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). ; 2017.'
apa: 'Heymann, J., Drude, L., Boeddeker, C., Hanebrink, P., & Haeb-Umbach, R.
(2017). BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR
System. In Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP).'
bibtex: '@inproceedings{Heymann_Drude_Boeddeker_Hanebrink_Haeb-Umbach_2017, title={BEAMNET:
End-to-End Training of a Beamformer-Supported Multi-Channel ASR System}, booktitle={Proc.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann,
Jahn and Drude, Lukas and Boeddeker, Christoph and Hanebrink, Patrick and Haeb-Umbach,
Reinhold}, year={2017} }'
chicago: 'Heymann, Jahn, Lukas Drude, Christoph Boeddeker, Patrick Hanebrink, and
Reinhold Haeb-Umbach. “BEAMNET: End-to-End Training of a Beamformer-Supported
Multi-Channel ASR System.” In Proc. IEEE Intl. Conf. on Acoustics, Speech and
Signal Processing (ICASSP), 2017.'
ieee: 'J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, and R. Haeb-Umbach, “BEAMNET:
End-to-End Training of a Beamformer-Supported Multi-Channel ASR System,” in Proc.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.'
mla: 'Heymann, Jahn, et al. “BEAMNET: End-to-End Training of a Beamformer-Supported
Multi-Channel ASR System.” Proc. IEEE Intl. Conf. on Acoustics, Speech and
Signal Processing (ICASSP), 2017.'
short: 'J. Heymann, L. Drude, C. Boeddeker, P. Hanebrink, R. Haeb-Umbach, in: Proc.
IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2017.'
date_created: 2019-07-12T05:28:40Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_paper.pdf
oa: '1'
project:
- _id: '52'
name: Computing Resources Provided by the Paderborn Center for Parallel Computing
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2017/icassp_2017_heymann_poster.pdf
status: public
title: 'BEAMNET: End-to-End Training of a Beamformer-Supported Multi-Channel ASR System'
type: conference
user_id: '40767'
year: '2017'
...
---
_id: '11811'
abstract:
- lang: eng
text: 'Acoustic beamforming can greatly improve the performance of Automatic Speech
Recognition (ASR) and speech enhancement systems when multiple channels are available.
We recently proposed a way to support the model-based Generalized Eigenvalue beamforming
operation with a powerful neural network for spectral mask estimation. The enhancement
system has a number of desirable properties. In particular, neither assumptions
need to be made about the nature of the acoustic transfer function (e.g., being
anechonic), nor does the array configuration need to be known. While the system
has been originally developed to enhance speech in noisy environments, we show
in this article that it is also effective in suppressing reverberation, thus leading
to a generic trainable multi-channel speech enhancement system for robust speech
processing. To support this claim, we consider two distinct datasets: The CHiME
3 challenge, which features challenging real-world noise distortions, and the
Reverb challenge, which focuses on distortions caused by reverberation. We evaluate
the system both with respect to a speech enhancement and a recognition task. For
the first task we propose a new way to cope with the distortions introduced by
the Generalized Eigenvalue beamformer by renormalizing the target energy for each
frequency bin, and measure its effectiveness in terms of the PESQ score. For the
latter we feed the enhanced signal to a strong DNN back-end and achieve state-of-the-art
ASR results on both datasets. We further experiment with different network architectures
for spectral mask estimation: One small feed-forward network with only one hidden
layer, one Convolutional Neural Network and one bi-directional Long Short-Term
Memory network, showing that even a small network is capable of delivering significant
performance improvements.'
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: Heymann J, Drude L, Haeb-Umbach R. A Generic Neural Acoustic Beamforming Architecture
for Robust Multi-Channel Speech Processing. Computer Speech and Language.
2017.
apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2017). A Generic Neural Acoustic
Beamforming Architecture for Robust Multi-Channel Speech Processing. Computer
Speech and Language.
bibtex: '@article{Heymann_Drude_Haeb-Umbach_2017, title={A Generic Neural Acoustic
Beamforming Architecture for Robust Multi-Channel Speech Processing}, journal={Computer
Speech and Language}, author={Heymann, Jahn and Drude, Lukas and Haeb-Umbach,
Reinhold}, year={2017} }'
chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “A Generic Neural
Acoustic Beamforming Architecture for Robust Multi-Channel Speech Processing.”
Computer Speech and Language, 2017.
ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “A Generic Neural Acoustic Beamforming
Architecture for Robust Multi-Channel Speech Processing,” Computer Speech and
Language, 2017.
mla: Heymann, Jahn, et al. “A Generic Neural Acoustic Beamforming Architecture for
Robust Multi-Channel Speech Processing.” Computer Speech and Language,
2017.
short: J. Heymann, L. Drude, R. Haeb-Umbach, Computer Speech and Language (2017).
date_created: 2019-07-12T05:28:43Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/ComputerSpeechLanguage_2017_heymann_paper.pdf
oa: '1'
publication: Computer Speech and Language
status: public
title: A Generic Neural Acoustic Beamforming Architecture for Robust Multi-Channel
Speech Processing
type: journal_article
user_id: '44006'
year: '2017'
...
---
_id: '11759'
abstract:
- lang: eng
text: 'Variational Autoencoders (VAEs) have been shown to provide efficient neural-network-based
approximate Bayesian inference for observation models for which exact inference
is intractable. Its extension, the so-called Structured VAE (SVAE) allows inference
in the presence of both discrete and continuous latent variables. Inspired by
this extension, we developed a VAE with Hidden Markov Models (HMMs) as latent
models. We applied the resulting HMM-VAE to the task of acoustic unit discovery
in a zero resource scenario. Starting from an initial model based on variational
inference in an HMM with Gaussian Mixture Model (GMM) emission probabilities,
the accuracy of the acoustic unit discovery could be significantly improved by
the HMM-VAE. In doing so we were able to demonstrate for an unsupervised learning
task what is well-known in the supervised learning case: Neural networks provide
superior modeling power compared to GMMs.'
author:
- first_name: Janek
full_name: Ebbers, Janek
id: '34851'
last_name: Ebbers
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Thomas
full_name: Glarner, Thomas
id: '14169'
last_name: Glarner
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Bhiksha
full_name: Raj, Bhiksha
last_name: Raj
citation:
ama: 'Ebbers J, Heymann J, Drude L, Glarner T, Haeb-Umbach R, Raj B. Hidden Markov
Model Variational Autoencoder for Acoustic Unit Discovery. In: INTERSPEECH
2017, Stockholm, Schweden. ; 2017.'
apa: Ebbers, J., Heymann, J., Drude, L., Glarner, T., Haeb-Umbach, R., & Raj,
B. (2017). Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery.
INTERSPEECH 2017, Stockholm, Schweden.
bibtex: '@inproceedings{Ebbers_Heymann_Drude_Glarner_Haeb-Umbach_Raj_2017, title={Hidden
Markov Model Variational Autoencoder for Acoustic Unit Discovery}, booktitle={INTERSPEECH
2017, Stockholm, Schweden}, author={Ebbers, Janek and Heymann, Jahn and Drude,
Lukas and Glarner, Thomas and Haeb-Umbach, Reinhold and Raj, Bhiksha}, year={2017}
}'
chicago: Ebbers, Janek, Jahn Heymann, Lukas Drude, Thomas Glarner, Reinhold Haeb-Umbach,
and Bhiksha Raj. “Hidden Markov Model Variational Autoencoder for Acoustic Unit
Discovery.” In INTERSPEECH 2017, Stockholm, Schweden, 2017.
ieee: J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, and B. Raj, “Hidden
Markov Model Variational Autoencoder for Acoustic Unit Discovery,” 2017.
mla: Ebbers, Janek, et al. “Hidden Markov Model Variational Autoencoder for Acoustic
Unit Discovery.” INTERSPEECH 2017, Stockholm, Schweden, 2017.
short: 'J. Ebbers, J. Heymann, L. Drude, T. Glarner, R. Haeb-Umbach, B. Raj, in:
INTERSPEECH 2017, Stockholm, Schweden, 2017.'
date_created: 2019-07-12T05:27:42Z
date_updated: 2023-11-22T08:29:06Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_paper.pdf
oa: '1'
publication: INTERSPEECH 2017, Stockholm, Schweden
quality_controlled: '1'
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_poster.pdf
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2017/INTERSPEECH_2017_Ebbers_slides.pdf
status: public
title: Hidden Markov Model Variational Autoencoder for Acoustic Unit Discovery
type: conference
user_id: '34851'
year: '2017'
...
---
_id: '11895'
abstract:
- lang: eng
text: Multi-channel speech enhancement algorithms rely on a synchronous sampling
of the microphone signals. This, however, cannot always be guaranteed, especially
if the sensors are distributed in an environment. To avoid performance degradation
the sampling rate offset needs to be estimated and compensated for. In this contribution
we extend the recently proposed coherence drift based method in two important
directions. First, the increasing phase shift in the short-time Fourier transform
domain is estimated from the coherence drift in a Matched Filterlike fashion,
where intermediate estimates are weighted by their instantaneous SNR. Second,
an observed bias is removed by iterating between offset estimation and compensation
by resampling a couple of times. The effectiveness of the proposed method is demonstrated
by speech recognition results on the output of a beamformer with and without sampling
rate offset compensation between the input channels. We compare MVDR and maximum-SNR
beamformers in reverberant environments and further show that both benefit from
a novel phase normalization, which we also propose in this contribution.
author:
- first_name: Joerg
full_name: Schmalenstroeer, Joerg
id: '460'
last_name: Schmalenstroeer
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Schmalenstroeer J, Heymann J, Drude L, Boeddeker C, Haeb-Umbach R. Multi-Stage
Coherence Drift Based Sampling Rate Synchronization for Acoustic Beamforming.
In: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP).
; 2017.'
apa: Schmalenstroeer, J., Heymann, J., Drude, L., Boeddeker, C., & Haeb-Umbach,
R. (2017). Multi-Stage Coherence Drift Based Sampling Rate Synchronization for
Acoustic Beamforming. IEEE 19th International Workshop on Multimedia Signal
Processing (MMSP).
bibtex: '@inproceedings{Schmalenstroeer_Heymann_Drude_Boeddeker_Haeb-Umbach_2017,
title={Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
Beamforming}, booktitle={IEEE 19th International Workshop on Multimedia Signal
Processing (MMSP)}, author={Schmalenstroeer, Joerg and Heymann, Jahn and Drude,
Lukas and Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2017} }'
chicago: Schmalenstroeer, Joerg, Jahn Heymann, Lukas Drude, Christoph Boeddeker,
and Reinhold Haeb-Umbach. “Multi-Stage Coherence Drift Based Sampling Rate Synchronization
for Acoustic Beamforming.” In IEEE 19th International Workshop on Multimedia
Signal Processing (MMSP), 2017.
ieee: J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, and R. Haeb-Umbach,
“Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
Beamforming,” 2017.
mla: Schmalenstroeer, Joerg, et al. “Multi-Stage Coherence Drift Based Sampling
Rate Synchronization for Acoustic Beamforming.” IEEE 19th International Workshop
on Multimedia Signal Processing (MMSP), 2017.
short: 'J. Schmalenstroeer, J. Heymann, L. Drude, C. Boeddeker, R. Haeb-Umbach,
in: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017.'
date_created: 2019-07-12T05:30:20Z
date_updated: 2023-10-26T08:12:05Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb.pdf
oa: '1'
publication: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)
quality_controlled: '1'
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2017/MMSP_2017_SchHaeb_poster.pdf
status: public
title: Multi-Stage Coherence Drift Based Sampling Rate Synchronization for Acoustic
Beamforming
type: conference
user_id: '460'
year: '2017'
...
---
_id: '11744'
abstract:
- lang: eng
text: A noise power spectral density (PSD) estimation is an indispensable component
of speech spectral enhancement systems. In this paper we present a noise PSD tracking
algorithm, which employs a noise presence probability estimate delivered by a
deep neural network (DNN). The algorithm provides a causal noise PSD estimate
and can thus be used in speech enhancement systems for communication purposes.
An extensive performance comparison has been carried out with ten causal state-of-the-art
noise tracking algorithms taken from the literature and categorized acc. to applied
techniques. The experiments showed that the proposed DNN-based noise PSD tracker
outperforms all competing methods with respect to all tested performance measures,
which include the noise tracking performance and the performance of a speech enhancement
system employing the noise tracking component.
author:
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Chinaev A, Heymann J, Drude L, Haeb-Umbach R. Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs. In: 12. ITG Fachtagung Sprachkommunikation
(ITG 2016). ; 2016.'
apa: Chinaev, A., Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs. In 12. ITG Fachtagung Sprachkommunikation
(ITG 2016).
bibtex: '@inproceedings{Chinaev_Heymann_Drude_Haeb-Umbach_2016, title={Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs}, booktitle={12. ITG Fachtagung Sprachkommunikation
(ITG 2016)}, author={Chinaev, Aleksej and Heymann, Jahn and Drude, Lukas and Haeb-Umbach,
Reinhold}, year={2016} }'
chicago: Chinaev, Aleksej, Jahn Heymann, Lukas Drude, and Reinhold Haeb-Umbach.
“Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs.” In 12.
ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
ieee: A. Chinaev, J. Heymann, L. Drude, and R. Haeb-Umbach, “Noise-Presence-Probability-Based
Noise PSD Estimation by Using DNNs,” in 12. ITG Fachtagung Sprachkommunikation
(ITG 2016), 2016.
mla: Chinaev, Aleksej, et al. “Noise-Presence-Probability-Based Noise PSD Estimation
by Using DNNs.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
short: 'A. Chinaev, J. Heymann, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.'
date_created: 2019-07-12T05:27:25Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16.pdf
oa: '1'
publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016)
related_material:
link:
- description: Presentation
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/ChHeyDrHa16_Presentation.pdf
status: public
title: Noise-Presence-Probability-Based Noise PSD Estimation by Using DNNs
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11751'
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Christoph
full_name: Boeddeker, Christoph
id: '40767'
last_name: Boeddeker
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Boeddeker C, Haeb-Umbach R. Blind Speech Separation based on Complex
Spherical k-Mode Clustering. In: Proc. IEEE Intl. Conf. on Acoustics, Speech
and Signal Processing (ICASSP). ; 2016.'
apa: Drude, L., Boeddeker, C., & Haeb-Umbach, R. (2016). Blind Speech Separation
based on Complex Spherical k-Mode Clustering. In Proc. IEEE Intl. Conf. on
Acoustics, Speech and Signal Processing (ICASSP).
bibtex: '@inproceedings{Drude_Boeddeker_Haeb-Umbach_2016, title={Blind Speech Separation
based on Complex Spherical k-Mode Clustering}, booktitle={Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP)}, author={Drude, Lukas and
Boeddeker, Christoph and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Drude, Lukas, Christoph Boeddeker, and Reinhold Haeb-Umbach. “Blind Speech
Separation Based on Complex Spherical K-Mode Clustering.” In Proc. IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2016.
ieee: L. Drude, C. Boeddeker, and R. Haeb-Umbach, “Blind Speech Separation based
on Complex Spherical k-Mode Clustering,” in Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.
mla: Drude, Lukas, et al. “Blind Speech Separation Based on Complex Spherical K-Mode
Clustering.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), 2016.
short: 'L. Drude, C. Boeddeker, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.'
date_created: 2019-07-12T05:27:33Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_drude_slides.pdf
status: public
title: Blind Speech Separation based on Complex Spherical k-Mode Clustering
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11756'
abstract:
- lang: eng
text: Although complex-valued neural networks (CVNNs) â?? networks which can operate
with complex arithmetic â?? have been around for a while, they have not been given
reconsideration since the breakthrough of deep network architectures. This paper
presents a critical assessment whether the novel tool set of deep neural networks
(DNNs) should be extended to complex-valued arithmetic. Indeed, with DNNs making
inroads in speech enhancement tasks, the use of complex-valued input data, specifically
the short-time Fourier transform coefficients, is an obvious consideration. In
particular when it comes to performing tasks that heavily rely on phase information,
such as acoustic beamforming, complex-valued algorithms are omnipresent. In this
contribution we recapitulate backpropagation in CVNNs, develop complex-valued
network elements, such as the split-rectified non-linearity, and compare real-
and complex-valued networks on a beamforming task. We find that CVNNs hardly provide
a performance gain and conclude that the effort of developing the complex-valued
counterparts of the building blocks of modern deep or recurrent neural networks
can hardly be justified.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Bhiksha
full_name: Raj, Bhiksha
last_name: Raj
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Raj B, Haeb-Umbach R. On the appropriateness of complex-valued neural
networks for speech enhancement. In: INTERSPEECH 2016, San Francisco, USA.
; 2016.'
apa: Drude, L., Raj, B., & Haeb-Umbach, R. (2016). On the appropriateness of
complex-valued neural networks for speech enhancement. In INTERSPEECH 2016,
San Francisco, USA.
bibtex: '@inproceedings{Drude_Raj_Haeb-Umbach_2016, title={On the appropriateness
of complex-valued neural networks for speech enhancement}, booktitle={INTERSPEECH
2016, San Francisco, USA}, author={Drude, Lukas and Raj, Bhiksha and Haeb-Umbach,
Reinhold}, year={2016} }'
chicago: Drude, Lukas, Bhiksha Raj, and Reinhold Haeb-Umbach. “On the Appropriateness
of Complex-Valued Neural Networks for Speech Enhancement.” In INTERSPEECH 2016,
San Francisco, USA, 2016.
ieee: L. Drude, B. Raj, and R. Haeb-Umbach, “On the appropriateness of complex-valued
neural networks for speech enhancement,” in INTERSPEECH 2016, San Francisco,
USA, 2016.
mla: Drude, Lukas, et al. “On the Appropriateness of Complex-Valued Neural Networks
for Speech Enhancement.” INTERSPEECH 2016, San Francisco, USA, 2016.
short: 'L. Drude, B. Raj, R. Haeb-Umbach, in: INTERSPEECH 2016, San Francisco, USA,
2016.'
date_created: 2019-07-12T05:27:39Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_paper.pdf
oa: '1'
publication: INTERSPEECH 2016, San Francisco, USA
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/interspeech_2016_drude_slides.pdf
status: public
title: On the appropriateness of complex-valued neural networks for speech enhancement
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11771'
abstract:
- lang: eng
text: This paper is concerned with speech presence probability estimation employing
an explicit model of the temporal and spectral correlations of speech. An undirected
graphical model is introduced, based on a Factor Graph formulation. It is shown
that this undirected model cures some of the theoretical issues of an earlier
directed graphical model. Furthermore, we formulate a message passing inference
scheme based on an approximate graph factorization, identify this inference scheme
as a particular message passing schedule based on the turbo principle and suggest
further alternative schedules. The experiments show an improved performance over
speech presence probability estimation based on an IID assumption, and a slightly
better performance of the turbo schedule over the alternatives.
author:
- first_name: Thomas
full_name: Glarner, Thomas
id: '14169'
last_name: Glarner
- first_name: Mohammad
full_name: Mahdi Momenzadeh, Mohammad
last_name: Mahdi Momenzadeh
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Glarner T, Mahdi Momenzadeh M, Drude L, Haeb-Umbach R. Factor Graph Decoding
for Speech Presence Probability Estimation. In: 12. ITG Fachtagung Sprachkommunikation
(ITG 2016). ; 2016.'
apa: Glarner, T., Mahdi Momenzadeh, M., Drude, L., & Haeb-Umbach, R. (2016).
Factor Graph Decoding for Speech Presence Probability Estimation. In 12. ITG
Fachtagung Sprachkommunikation (ITG 2016).
bibtex: '@inproceedings{Glarner_Mahdi Momenzadeh_Drude_Haeb-Umbach_2016, title={Factor
Graph Decoding for Speech Presence Probability Estimation}, booktitle={12. ITG
Fachtagung Sprachkommunikation (ITG 2016)}, author={Glarner, Thomas and Mahdi
Momenzadeh, Mohammad and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016}
}'
chicago: Glarner, Thomas, Mohammad Mahdi Momenzadeh, Lukas Drude, and Reinhold Haeb-Umbach.
“Factor Graph Decoding for Speech Presence Probability Estimation.” In 12.
ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
ieee: T. Glarner, M. Mahdi Momenzadeh, L. Drude, and R. Haeb-Umbach, “Factor Graph
Decoding for Speech Presence Probability Estimation,” in 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.
mla: Glarner, Thomas, et al. “Factor Graph Decoding for Speech Presence Probability
Estimation.” 12. ITG Fachtagung Sprachkommunikation (ITG 2016), 2016.
short: 'T. Glarner, M. Mahdi Momenzadeh, L. Drude, R. Haeb-Umbach, in: 12. ITG Fachtagung
Sprachkommunikation (ITG 2016), 2016.'
date_created: 2019-07-12T05:27:56Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner.pdf
oa: '1'
publication: 12. ITG Fachtagung Sprachkommunikation (ITG 2016)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/itgspeech2016_08_Glarner_slides.pdf
status: public
title: Factor Graph Decoding for Speech Presence Probability Estimation
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11812'
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R. Neural Network Based Spectral Mask Estimation
for Acoustic Beamforming. In: Proc. IEEE Intl. Conf. on Acoustics, Speech and
Signal Processing (ICASSP). ; 2016.'
apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming. In Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP).
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming}, booktitle={Proc. IEEE Intl.
Conf. on Acoustics, Speech and Signal Processing (ICASSP)}, author={Heymann, Jahn
and Drude, Lukas and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Neural Network Based
Spectral Mask Estimation for Acoustic Beamforming.” In Proc. IEEE Intl. Conf.
on Acoustics, Speech and Signal Processing (ICASSP), 2016.
ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Neural Network Based Spectral Mask
Estimation for Acoustic Beamforming,” in Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.
mla: Heymann, Jahn, et al. “Neural Network Based Spectral Mask Estimation for Acoustic
Beamforming.” Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
(ICASSP), 2016.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Proc. IEEE Intl. Conf. on Acoustics,
Speech and Signal Processing (ICASSP), 2016.'
date_created: 2019-07-12T05:28:44Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_paper.pdf
oa: '1'
publication: Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
related_material:
link:
- description: Slides
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/icassp_2016_heymann_slides.pdf
status: public
title: Neural Network Based Spectral Mask Estimation for Acoustic Beamforming
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11834'
abstract:
- lang: eng
text: We present a system for the 4th CHiME challenge which significantly increases
the performance for all three tracks with respect to the provided baseline system.
The front-end uses a bi-directional Long Short-Term Memory (BLSTM)-based neural
network to estimate signal statistics. These then steer a Generalized Eigenvalue
beamformer. The back-end consists of a 22 layer deep Wide Residual Network and
two extra BLSTM layers. Working on a whole utterance instead of frames allows
us to refine Batch-Normalization. We also train our own BLSTM-based language model.
Adding a discriminative speaker adaptation leads to further gains. The final system
achieves a word error rate on the six channel real test data of 3.48%. For the
two channel track we achieve 5.96% and for the one channel track 9.34%. This is
the best reported performance on the challenge achieved by a single system, i.e.,
a configuration, which does not combine multiple systems. At the same time, our
system is independent of the microphone configuration. We can thus use the same
components for all three tracks.
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Haeb-Umbach R. Wide Residual BLSTM Network with Discriminative
Speaker Adaptation for Robust Speech Recognition. In: Computer Speech and Language.
; 2016.'
apa: Heymann, J., Drude, L., & Haeb-Umbach, R. (2016). Wide Residual BLSTM Network
with Discriminative Speaker Adaptation for Robust Speech Recognition. In Computer
Speech and Language.
bibtex: '@inproceedings{Heymann_Drude_Haeb-Umbach_2016, title={Wide Residual BLSTM
Network with Discriminative Speaker Adaptation for Robust Speech Recognition},
booktitle={Computer Speech and Language}, author={Heymann, Jahn and Drude, Lukas
and Haeb-Umbach, Reinhold}, year={2016} }'
chicago: Heymann, Jahn, Lukas Drude, and Reinhold Haeb-Umbach. “Wide Residual BLSTM
Network with Discriminative Speaker Adaptation for Robust Speech Recognition.”
In Computer Speech and Language, 2016.
ieee: J. Heymann, L. Drude, and R. Haeb-Umbach, “Wide Residual BLSTM Network with
Discriminative Speaker Adaptation for Robust Speech Recognition,” in Computer
Speech and Language, 2016.
mla: Heymann, Jahn, et al. “Wide Residual BLSTM Network with Discriminative Speaker
Adaptation for Robust Speech Recognition.” Computer Speech and Language,
2016.
short: 'J. Heymann, L. Drude, R. Haeb-Umbach, in: Computer Speech and Language,
2016.'
date_created: 2019-07-12T05:29:09Z
date_updated: 2022-01-06T06:51:11Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_paper.pdf
oa: '1'
publication: Computer Speech and Language
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_upbonly_poster.pdf
status: public
title: Wide Residual BLSTM Network with Discriminative Speaker Adaptation for Robust
Speech Recognition
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11908'
abstract:
- lang: eng
text: 'This paper describes automatic speech recognition (ASR) systems developed
jointly by RWTH, UPB and FORTH for the 1ch, 2ch and 6ch track of the 4th CHiME
Challenge. In the 2ch and 6ch tracks the final system output is obtained by a
Confusion Network Combination (CNC) of multiple systems. The Acoustic Model (AM)
is a deep neural network based on Bidirectional Long Short-Term Memory (BLSTM)
units. The systems differ by front ends and training sets used for the acoustic
training. The model for the 1ch track is trained without any preprocessing. For
each front end we trained and evaluated individual acoustic models. We compare
the ASR performance of different beamforming approaches: a conventional superdirective
beamformer [1] and an MVDR beamformer as in [2], where the steering vector is
estimated based on [3]. Furthermore we evaluated a BLSTM supported Generalized
Eigenvalue beamformer using NN-GEV [4]. The back end is implemented using RWTH?s
open-source toolkits RASR [5], RETURNN [6] and rwthlm [7]. We rescore lattices
with a Long Short-Term Memory (LSTM) based language model. The overall best results
are obtained by a system combination that includes the lattices from the system
of UPB?s submission [8]. Our final submission scored second in each of the three
tracks of the 4th CHiME Challenge.'
author:
- first_name: Tobias
full_name: Menne, Tobias
last_name: Menne
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Anastasios
full_name: Alexandridis, Anastasios
last_name: Alexandridis
- first_name: Kazuki
full_name: Irie, Kazuki
last_name: Irie
- first_name: Albert
full_name: Zeyer, Albert
last_name: Zeyer
- first_name: Markus
full_name: Kitza, Markus
last_name: Kitza
- first_name: Pavel
full_name: Golik, Pavel
last_name: Golik
- first_name: Ilia
full_name: Kulikov, Ilia
last_name: Kulikov
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Ralf
full_name: Schlüter, Ralf
last_name: Schlüter
- first_name: Hermann
full_name: Ney, Hermann
last_name: Ney
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
- first_name: Athanasios
full_name: Mouchtaris, Athanasios
last_name: Mouchtaris
citation:
ama: 'Menne T, Heymann J, Alexandridis A, et al. The RWTH/UPB/FORTH System Combination
for the 4th CHiME Challenge Evaluation. In: Computer Speech and Language.
; 2016.'
apa: Menne, T., Heymann, J., Alexandridis, A., Irie, K., Zeyer, A., Kitza, M., …
Mouchtaris, A. (2016). The RWTH/UPB/FORTH System Combination for the 4th CHiME
Challenge Evaluation. In Computer Speech and Language.
bibtex: '@inproceedings{Menne_Heymann_Alexandridis_Irie_Zeyer_Kitza_Golik_Kulikov_Drude_Schlüter_et
al._2016, title={The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge
Evaluation}, booktitle={Computer Speech and Language}, author={Menne, Tobias and
Heymann, Jahn and Alexandridis, Anastasios and Irie, Kazuki and Zeyer, Albert
and Kitza, Markus and Golik, Pavel and Kulikov, Ilia and Drude, Lukas and Schlüter,
Ralf and et al.}, year={2016} }'
chicago: Menne, Tobias, Jahn Heymann, Anastasios Alexandridis, Kazuki Irie, Albert
Zeyer, Markus Kitza, Pavel Golik, et al. “The RWTH/UPB/FORTH System Combination
for the 4th CHiME Challenge Evaluation.” In Computer Speech and Language,
2016.
ieee: T. Menne et al., “The RWTH/UPB/FORTH System Combination for the 4th
CHiME Challenge Evaluation,” in Computer Speech and Language, 2016.
mla: Menne, Tobias, et al. “The RWTH/UPB/FORTH System Combination for the 4th CHiME
Challenge Evaluation.” Computer Speech and Language, 2016.
short: 'T. Menne, J. Heymann, A. Alexandridis, K. Irie, A. Zeyer, M. Kitza, P. Golik,
I. Kulikov, L. Drude, R. Schlüter, H. Ney, R. Haeb-Umbach, A. Mouchtaris, in:
Computer Speech and Language, 2016.'
date_created: 2019-07-12T05:30:35Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2016/chime4_rwthupbforth_paper.pdf
oa: '1'
publication: Computer Speech and Language
status: public
title: The RWTH/UPB/FORTH System Combination for the 4th CHiME Challenge Evaluation
type: conference
user_id: '44006'
year: '2016'
...
---
_id: '11755'
abstract:
- lang: eng
text: This contribution presents a Direction of Arrival (DoA) estimation algorithm
based on the complex Watson distribution to incorporate both phase and level differences
of captured micro- phone array signals. The derived algorithm is reviewed in the
context of the Generalized State Coherence Transform (GSCT) on the one hand and
a kernel density estimation method on the other hand. A thorough simulative evaluation
yields insight into parameter selection and provides details on the performance
for both directional and omni-directional microphones. A comparison to the well
known Steered Response Power with Phase Transform (SRP-PHAT) algorithm and a state
of the art DoA estimator which explicitly accounts for aliasing, shows in particular
the advantages of presented algorithm if inter-sensor level differences are indicative
of the DoA, as with directional microphones.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Florian
full_name: Jacob, Florian
last_name: Jacob
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Jacob F, Haeb-Umbach R. DOA-Estimation based on a Complex Watson
Kernel Method. In: 23th European Signal Processing Conference (EUSIPCO 2015).
; 2015.'
apa: Drude, L., Jacob, F., & Haeb-Umbach, R. (2015). DOA-Estimation based on
a Complex Watson Kernel Method. In 23th European Signal Processing Conference
(EUSIPCO 2015).
bibtex: '@inproceedings{Drude_Jacob_Haeb-Umbach_2015, title={DOA-Estimation based
on a Complex Watson Kernel Method}, booktitle={23th European Signal Processing
Conference (EUSIPCO 2015)}, author={Drude, Lukas and Jacob, Florian and Haeb-Umbach,
Reinhold}, year={2015} }'
chicago: Drude, Lukas, Florian Jacob, and Reinhold Haeb-Umbach. “DOA-Estimation
Based on a Complex Watson Kernel Method.” In 23th European Signal Processing
Conference (EUSIPCO 2015), 2015.
ieee: L. Drude, F. Jacob, and R. Haeb-Umbach, “DOA-Estimation based on a Complex
Watson Kernel Method,” in 23th European Signal Processing Conference (EUSIPCO
2015), 2015.
mla: Drude, Lukas, et al. “DOA-Estimation Based on a Complex Watson Kernel Method.”
23th European Signal Processing Conference (EUSIPCO 2015), 2015.
short: 'L. Drude, F. Jacob, R. Haeb-Umbach, in: 23th European Signal Processing
Conference (EUSIPCO 2015), 2015.'
date_created: 2019-07-12T05:27:38Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15.pdf
oa: '1'
publication: 23th European Signal Processing Conference (EUSIPCO 2015)
related_material:
link:
- description: Presentation
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2015/DrJaHa15_Presentation.pdf
status: public
title: DOA-Estimation based on a Complex Watson Kernel Method
type: conference
user_id: '44006'
year: '2015'
...
---
_id: '11810'
author:
- first_name: Jahn
full_name: Heymann, Jahn
id: '9168'
last_name: Heymann
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Heymann J, Drude L, Chinaev A, Haeb-Umbach R. BLSTM supported GEV Beamformer
Front-End for the 3RD CHiME Challenge. In: Automatic Speech Recognition and
Understanding Workshop (ASRU 2015). ; 2015.'
apa: Heymann, J., Drude, L., Chinaev, A., & Haeb-Umbach, R. (2015). BLSTM supported
GEV Beamformer Front-End for the 3RD CHiME Challenge. In Automatic Speech Recognition
and Understanding Workshop (ASRU 2015).
bibtex: '@inproceedings{Heymann_Drude_Chinaev_Haeb-Umbach_2015, title={BLSTM supported
GEV Beamformer Front-End for the 3RD CHiME Challenge}, booktitle={Automatic Speech
Recognition and Understanding Workshop (ASRU 2015)}, author={Heymann, Jahn and
Drude, Lukas and Chinaev, Aleksej and Haeb-Umbach, Reinhold}, year={2015} }'
chicago: Heymann, Jahn, Lukas Drude, Aleksej Chinaev, and Reinhold Haeb-Umbach.
“BLSTM Supported GEV Beamformer Front-End for the 3RD CHiME Challenge.” In Automatic
Speech Recognition and Understanding Workshop (ASRU 2015), 2015.
ieee: J. Heymann, L. Drude, A. Chinaev, and R. Haeb-Umbach, “BLSTM supported GEV
Beamformer Front-End for the 3RD CHiME Challenge,” in Automatic Speech Recognition
and Understanding Workshop (ASRU 2015), 2015.
mla: Heymann, Jahn, et al. “BLSTM Supported GEV Beamformer Front-End for the 3RD
CHiME Challenge.” Automatic Speech Recognition and Understanding Workshop (ASRU
2015), 2015.
short: 'J. Heymann, L. Drude, A. Chinaev, R. Haeb-Umbach, in: Automatic Speech Recognition
and Understanding Workshop (ASRU 2015), 2015.'
date_created: 2019-07-12T05:28:41Z
date_updated: 2022-01-06T06:51:09Z
department:
- _id: '54'
language:
- iso: eng
publication: Automatic Speech Recognition and Understanding Workshop (ASRU 2015)
status: public
title: BLSTM supported GEV Beamformer Front-End for the 3RD CHiME Challenge
type: conference
user_id: '44006'
year: '2015'
...
---
_id: '11919'
abstract:
- lang: eng
text: In this paper we present a source counting algorithm to determine the number
of speakers in a speech mixture. In our proposed method, we model the histogram
of estimated directions of arrival with a nonparametric Bayesian infinite Gaussian
mixture model. As an alternative to classical model selection criteria and to
avoid specifying the maximum number of mixture components in advance, a Dirichlet
process prior is employed over the mixture components. This allows to automatically
determine the optimal number of mixture components that most probably model the
observations. We demonstrate by experiments that this model outperforms a parametric
approach using a finite Gaussian mixture model with a Dirichlet distribution prior
over the mixture weights.
author:
- first_name: Oliver
full_name: Walter, Oliver
last_name: Walter
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Walter O, Drude L, Haeb-Umbach R. Source Counting in Speech Mixtures by Nonparametric
Bayesian Estimation of an infinite Gaussian Mixture Model. In: 40th International
Conference on Acoustics, Speech and Signal Processing (ICASSP 2015). ; 2015.'
apa: Walter, O., Drude, L., & Haeb-Umbach, R. (2015). Source Counting in Speech
Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture
Model. In 40th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2015).
bibtex: '@inproceedings{Walter_Drude_Haeb-Umbach_2015, title={Source Counting in
Speech Mixtures by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture
Model}, booktitle={40th International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2015)}, author={Walter, Oliver and Drude, Lukas and Haeb-Umbach,
Reinhold}, year={2015} }'
chicago: Walter, Oliver, Lukas Drude, and Reinhold Haeb-Umbach. “Source Counting
in Speech Mixtures by Nonparametric Bayesian Estimation of an Infinite Gaussian
Mixture Model.” In 40th International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2015), 2015.
ieee: O. Walter, L. Drude, and R. Haeb-Umbach, “Source Counting in Speech Mixtures
by Nonparametric Bayesian Estimation of an infinite Gaussian Mixture Model,” in
40th International Conference on Acoustics, Speech and Signal Processing (ICASSP
2015), 2015.
mla: Walter, Oliver, et al. “Source Counting in Speech Mixtures by Nonparametric
Bayesian Estimation of an Infinite Gaussian Mixture Model.” 40th International
Conference on Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.
short: 'O. Walter, L. Drude, R. Haeb-Umbach, in: 40th International Conference on
Acoustics, Speech and Signal Processing (ICASSP 2015), 2015.'
date_created: 2019-07-12T05:30:47Z
date_updated: 2022-01-06T06:51:12Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15.pdf
oa: '1'
publication: 40th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2015)
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2015/WaDrHa15_Poster.pdf
status: public
title: Source Counting in Speech Mixtures by Nonparametric Bayesian Estimation of
an infinite Gaussian Mixture Model
type: conference
user_id: '44006'
year: '2015'
...
---
_id: '11752'
abstract:
- lang: eng
text: ' "In this contribution we derive a variational EM (VEM) algorithm for model
selection in complex Watson mixture models, which have been recently proposed
as a model of the distribution of normalized microphone array signals in the short-time
Fourier transform domain. The VEM algorithm is applied to count the number of
active sources in a speech mixture by iteratively estimating the mode vectors
of the Watson distributions and suppressing the signals from the corresponding
directions. A key theoretical contribution is the derivation of the MMSE estimate
of a quadratic form involving the mode vector of the Watson distribution. The
experimental results demonstrate the effectiveness of the source counting approach
at moderately low SNR. It is further shown that the VEM algorithm is more robust
w.r.t. used threshold values." '
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Dang Hai
full_name: Tran Vu, Dang Hai
last_name: Tran Vu
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Chinaev A, Tran Vu DH, Haeb-Umbach R. Source Counting in Speech Mixtures
Using a Variational EM Approach for Complexwatson Mixture Models. In: 39th
International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014).
; 2014.'
apa: Drude, L., Chinaev, A., Tran Vu, D. H., & Haeb-Umbach, R. (2014). Source
Counting in Speech Mixtures Using a Variational EM Approach for Complexwatson
Mixture Models. In 39th International Conference on Acoustics, Speech and Signal
Processing (ICASSP 2014).
bibtex: '@inproceedings{Drude_Chinaev_Tran Vu_Haeb-Umbach_2014, title={Source Counting
in Speech Mixtures Using a Variational EM Approach for Complexwatson Mixture Models},
booktitle={39th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2014)}, author={Drude, Lukas and Chinaev, Aleksej and Tran Vu, Dang Hai
and Haeb-Umbach, Reinhold}, year={2014} }'
chicago: Drude, Lukas, Aleksej Chinaev, Dang Hai Tran Vu, and Reinhold Haeb-Umbach.
“Source Counting in Speech Mixtures Using a Variational EM Approach for Complexwatson
Mixture Models.” In 39th International Conference on Acoustics, Speech and
Signal Processing (ICASSP 2014), 2014.
ieee: L. Drude, A. Chinaev, D. H. Tran Vu, and R. Haeb-Umbach, “Source Counting
in Speech Mixtures Using a Variational EM Approach for Complexwatson Mixture Models,”
in 39th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2014), 2014.
mla: Drude, Lukas, et al. “Source Counting in Speech Mixtures Using a Variational
EM Approach for Complexwatson Mixture Models.” 39th International Conference
on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.
short: 'L. Drude, A. Chinaev, D.H. Tran Vu, R. Haeb-Umbach, in: 39th International
Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), 2014.'
date_created: 2019-07-12T05:27:34Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHa2014.pdf
oa: '1'
publication: 39th International Conference on Acoustics, Speech and Signal Processing
(ICASSP 2014)
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHa2014_Poster.pdf
status: public
title: Source Counting in Speech Mixtures Using a Variational EM Approach for Complexwatson
Mixture Models
type: conference
user_id: '44006'
year: '2014'
...
---
_id: '11753'
abstract:
- lang: eng
text: This contribution describes a step-wise source counting algorithm to determine
the number of speakers in an offline scenario. Each speaker is identified by a
variational expectation maximization (VEM) algorithm for complex Watson mixture
models and therefore directly yields beamforming vectors for a subsequent speech
separation process. An observation selection criterion is proposed which improves
the robustness of the source counting in noise. The algorithm is compared to an
alternative VEM approach with Gaussian mixture models based on directions of arrival
and shown to deliver improved source counting accuracy. The article concludes
by extending the offline algorithm towards a low-latency online estimation of
the number of active sources from the streaming input data.
author:
- first_name: Lukas
full_name: Drude, Lukas
id: '11213'
last_name: Drude
- first_name: Aleksej
full_name: Chinaev, Aleksej
last_name: Chinaev
- first_name: Dang Hai
full_name: Tran Vu, Dang Hai
last_name: Tran Vu
- first_name: Reinhold
full_name: Haeb-Umbach, Reinhold
id: '242'
last_name: Haeb-Umbach
citation:
ama: 'Drude L, Chinaev A, Tran Vu DH, Haeb-Umbach R. Towards Online Source Counting
in Speech Mixtures Applying a Variational EM for Complex Watson Mixture Models.
In: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014).
; 2014:213-217.'
apa: Drude, L., Chinaev, A., Tran Vu, D. H., & Haeb-Umbach, R. (2014). Towards
Online Source Counting in Speech Mixtures Applying a Variational EM for Complex
Watson Mixture Models. In 14th International Workshop on Acoustic Signal Enhancement
(IWAENC 2014) (pp. 213–217).
bibtex: '@inproceedings{Drude_Chinaev_Tran Vu_Haeb-Umbach_2014, title={Towards Online
Source Counting in Speech Mixtures Applying a Variational EM for Complex Watson
Mixture Models}, booktitle={14th International Workshop on Acoustic Signal Enhancement
(IWAENC 2014)}, author={Drude, Lukas and Chinaev, Aleksej and Tran Vu, Dang Hai
and Haeb-Umbach, Reinhold}, year={2014}, pages={213–217} }'
chicago: Drude, Lukas, Aleksej Chinaev, Dang Hai Tran Vu, and Reinhold Haeb-Umbach.
“Towards Online Source Counting in Speech Mixtures Applying a Variational EM for
Complex Watson Mixture Models.” In 14th International Workshop on Acoustic
Signal Enhancement (IWAENC 2014), 213–17, 2014.
ieee: L. Drude, A. Chinaev, D. H. Tran Vu, and R. Haeb-Umbach, “Towards Online Source
Counting in Speech Mixtures Applying a Variational EM for Complex Watson Mixture
Models,” in 14th International Workshop on Acoustic Signal Enhancement (IWAENC
2014), 2014, pp. 213–217.
mla: Drude, Lukas, et al. “Towards Online Source Counting in Speech Mixtures Applying
a Variational EM for Complex Watson Mixture Models.” 14th International Workshop
on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–17.
short: 'L. Drude, A. Chinaev, D.H. Tran Vu, R. Haeb-Umbach, in: 14th International
Workshop on Acoustic Signal Enhancement (IWAENC 2014), 2014, pp. 213–217.'
date_created: 2019-07-12T05:27:35Z
date_updated: 2022-01-06T06:51:08Z
department:
- _id: '54'
keyword:
- Accuracy
- Acoustics
- Estimation
- Mathematical model
- Soruce separation
- Speech
- Vectors
- Bayes methods
- Blind source separation
- Directional statistics
- Number of speakers
- Speaker diarization
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14.pdf
oa: '1'
page: 213-217
publication: 14th International Workshop on Acoustic Signal Enhancement (IWAENC 2014)
related_material:
link:
- description: Poster
relation: supplementary_material
url: https://groups.uni-paderborn.de/nt/pubs/2014/DrChTrHaeb14_Poster.pdf
status: public
title: Towards Online Source Counting in Speech Mixtures Applying a Variational EM
for Complex Watson Mixture Models
type: conference
user_id: '44006'
year: '2014'
...