TY - CONF AB - Multi-talker speech and moving speakers still pose a significant challenge to automatic speech recognition systems. Assuming an enrollment utterance of the target speakeris available, the so-called SpeakerBeam concept has been recently proposed to extract the target speaker from a speech mixture. If multi-channel input is available, spatial properties of the speaker can be exploited to support the source extraction. In this contribution we investigate different approaches to exploit such spatial information. In particular, we are interested in the question, how useful this information is if the target speaker changes his/her position. To this end, we present a SpeakerBeam-based source extraction network that is adapted to work on moving speakers by recursively updating the beamformer coefficients. Experimental results are presented on two data sets, one with articially created room impulse responses, and one with real room impulse responses and noise recorded in a conference room. Interestingly, spatial features turn out to be advantageous even if the speaker position changes. AU - Heitkaemper, Jens AU - Feher, Thomas AU - Freitag, Michael AU - Haeb-Umbach, Reinhold ID - 14822 T2 - International Conference on Statistical Language and Speech Processing 2019, Ljubljana, Slovenia TI - A Study on Online Source Extraction in the Presence of Changing Speaker Positions ER - TY - CONF AB - This paper deals with multi-channel speech recognition in scenarios with multiple speakers. Recently, the spectral characteristics of a target speaker, extracted from an adaptation utterance, have been used to guide a neural network mask estimator to focus on that speaker. In this work we present two variants of speakeraware neural networks, which exploit both spectral and spatial information to allow better discrimination between target and interfering speakers. Thus, we introduce either a spatial preprocessing prior to the mask estimation or a spatial plus spectral speaker characterization block whose output is directly fed into the neural mask estimator. The target speaker’s spectral and spatial signature is extracted from an adaptation utterance recorded at the beginning of a session. We further adapt the architecture for low-latency processing by means of block-online beamforming that recursively updates the signal statistics. Experimental results show that the additional spatial information clearly improves source extraction, in particular in the same-gender case, and that our proposal achieves state-of-the-art performance in terms of distortion reduction and recognition accuracy. AU - Martin-Donas, Juan M. AU - Heitkaemper, Jens AU - Haeb-Umbach, Reinhold AU - Gomez, Angel M. AU - Peinado, Antonio M. ID - 14824 T2 - INTERSPEECH 2019, Graz, Austria TI - Multi-Channel Block-Online Source Extraction based on Utterance Adaptation ER - TY - CONF AB - In this paper, we present Hitachi and Paderborn University’s joint effort for automatic speech recognition (ASR) in a dinner party scenario. The main challenges of ASR systems for dinner party recordings obtained by multiple microphone arrays are (1) heavy speech overlaps, (2) severe noise and reverberation, (3) very natural onversational content, and possibly (4) insufficient training data. As an example of a dinner party scenario, we have chosen the data presented during the CHiME-5 speech recognition challenge, where the baseline ASR had a 73.3% word error rate (WER), and even the best performing system at the CHiME-5 challenge had a 46.1% WER. We extensively investigated a combination of the guided source separation-based speech enhancement technique and an already proposed strong ASR backend and found that a tight combination of these techniques provided substantial accuracy improvements. Our final system achieved WERs of 39.94% and 41.64% for the development and evaluation data, respectively, both of which are the best published results for the dataset. We also investigated with additional training data on the official small data in the CHiME-5 corpus to assess the intrinsic difficulty of this ASR task. AU - Kanda, Naoyuki AU - Boeddeker, Christoph AU - Heitkaemper, Jens AU - Fujita, Yusuke AU - Horiguchi, Shota AU - Haeb-Umbach, Reinhold ID - 14826 T2 - INTERSPEECH 2019, Graz, Austria TI - Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR ER - TY - GEN AU - Mair, Christina AU - Scheffler, Wolfram AU - Senger, Isabell AU - Sureth-Sloane, Caren ID - 14902 TI - Analyse der Veränderung der zwischenstaatlichen Gewinnaufteilung bei Einführung einer standardisierten Gewinnverteilungsmethode am Beispiel des Einsatzes von 3D-Druckern VL - 42 ER - TY - JOUR AB - We investigate optical microresonators consisting of either one or two coupled rectangular strips between upper and lower slab waveguides. The cavities are evanescently excited under oblique angles by thin-film guided, in-plane unguided waves supported by one of the slab waveguides. Beyond a specific incidence angle, losses are fully suppressed. The interaction between the guided mode of the cavity-strip and the incoming slab modes leads to resonant behavior for specific incidence angles and gaps. For a single cavity, at resonance, the input power is equally split among each of the four output ports, while for two cavities an add-drop filter can be realized that, at resonance, routes the incoming power completely to the forward drop waveguide via the cavity. For both applications, the strength of the interaction is controlled by the gaps between cavities and waveguides. AU - Ebers, Lena AU - Hammer, Manfred AU - Berkemeier, Manuel B. AU - Menzel, Alexander AU - Förstner, Jens ID - 14990 JF - OSA Continuum KW - tet_topic_waveguides SN - 2578-7519 TI - Coupled microstrip-cavities under oblique incidence of semi-guided waves: a lossless integrated optical add-drop filter VL - 2 ER - TY - JOUR AB - Many problem settings in machine learning are concerned with the simultaneous prediction of multiple target variables of diverse type. Amongst others, such problem settings arise in multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. These subfields of machine learning are typically studied in isolation, without highlighting or exploring important relationships. In this paper, we present a unifying view on what we call multi-target prediction (MTP) problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research. AU - Waegeman, Willem AU - Dembczynski, Krzysztof AU - Hüllermeier, Eyke ID - 15002 IS - 2 JF - Data Mining and Knowledge Discovery SN - 1573-756X TI - Multi-target prediction: a unifying view on problems and methods VL - 33 ER - TY - CONF AU - Melnikov, Vitaly AU - Hüllermeier, Eyke ID - 15007 T2 - Proceedings ACML, Asian Conference on Machine Learning (Proceedings of Machine Learning Research, 101) TI - Learning to Aggregate: Tackling the Aggregation/Disaggregation Problem for OWA ER - TY - CONF AU - Tornede, Alexander AU - Wever, Marcel Dominik AU - Hüllermeier, Eyke ED - Hoffmann, Frank ED - Hüllermeier, Eyke ED - Mikut, Ralf ID - 15011 SN - 978-3-7315-0979-0 T2 - Proceedings - 29. Workshop Computational Intelligence, Dortmund, 28. - 29. November 2019 TI - Algorithm Selection as Recommendation: From Collaborative Filtering to Dyad Ranking ER - TY - JOUR AU - Brandt, Sascha AU - Jähn, Claudius AU - Fischer, Matthias AU - Meyer auf der Heide, Friedhelm ID - 16337 IS - 7 JF - Computer Graphics Forum SN - 0167-7055 TI - Visibility‐Aware Progressive Farthest Point Sampling on the GPU VL - 38 ER - TY - GEN AB - We present a technique for rendering highly complex 3D scenes in real-time by generating uniformly distributed points on the scene's visible surfaces. The technique is applicable to a wide range of scene types, like scenes directly based on complex and detailed CAD data consisting of billions of polygons (in contrast to scenes handcrafted solely for visualization). This allows to visualize such scenes smoothly even in VR on a HMD with good image quality, while maintaining the necessary frame-rates. In contrast to other point based rendering methods, we place points in an approximated blue noise distribution only on visible surfaces and store them in a highly GPU efficient data structure, allowing to progressively refine the number of rendered points to maximize the image quality for a given target frame rate. Our evaluation shows that scenes consisting of a high amount of polygons can be rendered with interactive frame rates with good visual quality on standard hardware. AU - Brandt, Sascha AU - Jähn, Claudius AU - Fischer, Matthias AU - Meyer auf der Heide, Friedhelm ID - 16341 T2 - arXiv:1904.08225 TI - Rendering of Complex Heterogenous Scenes using Progressive Blue Surfels ER -