TY  - JOUR
AB  - In this study, we evaluate the impact of gender-biased data from German-language physician reviews on the fairness of fine-tuned language models. For two different downstream tasks, we use data reported to be gender biased and aggregate it with annotations. First, we propose a new approach to aspect-based sentiment analysis that allows identifying, extracting, and classifying implicit and explicit aspect phrases and their polarity within a single model. The second task we present is grade prediction, where we predict the overall grade of a review on the basis of the review text. For both tasks, we train numerous transformer models and evaluate their performance. The aggregation of sensitive attributes, such as a physician’s gender and migration background, with individual text reviews allows us to measure the performance of the models with respect to these sensitive groups. These group-wise performance measures act as extrinsic bias measures for our downstream tasks. In addition, we translate several gender-specific templates of the intrinsic bias metrics into the German language and evaluate our fine-tuned models. Based on this set of tasks, fine-tuned models, and intrinsic and extrinsic bias measures, we perform correlation analyses between intrinsic and extrinsic bias measures. In terms of sensitive groups and effect sizes, our bias measure results show different directions. Furthermore, correlations between measures of intrinsic and extrinsic bias can be observed in different directions. This leads us to conclude that gender-biased data does not inherently lead to biased models. Other variables, such as template dependency for intrinsic measures and label distribution in the data, must be taken into account as they strongly influence the metric results. Therefore, we suggest that metrics and templates should be chosen according to the given task and the biases to be assessed. 
AU  - Kersting, Joschka
AU  - Maoro, Falk
AU  - Geierhos, Michaela
ID  - 53801
JF  - Data & Knowledge Engineering
KW  - Language model fairness
KW  - Aspect phrase classification
KW  - Grade prediction
KW  - Physician reviews
SN  - 0169-023X
TI  - Towards comparable ratings: Exploring bias in German physician reviews
VL  - 148
ER  - 
TY  - BOOK
AB  - In the proposal for our CRC in 2011, we formulated a vision of markets for
IT services that describes an approach to the provision of such services
that was novel at that time and, to a large extent, remains so today:
„Our vision of on-the-fly computing is that of IT services individually and
automatically configured and brought to execution from flexibly combinable
services traded on markets. At the same time, we aim at organizing
markets whose participants maintain a lively market of services through
appropriate entrepreneurial actions.“
Over the last 12 years, we have developed methods and techniques to
address problems critical to the convenient, efficient, and secure use of
on-the-fly computing. Among other things, we have made the description
of services more convenient by allowing natural language input,
increased the quality of configured services through (natural language)
interaction and more efficient configuration processes and analysis
procedures, made the quality of (the products of) providers in the
marketplace transparent through reputation systems, and increased the
resource efficiency of execution through reconfigurable heterogeneous
computing nodes and an integrated treatment of service description and
configuration. We have also developed network infrastructures that have
a high degree of adaptivity, scalability, efficiency, and reliability, and
provide cryptographic guarantees of anonymity and security for market
participants and their products and services.
To demonstrate the pervasiveness of the OTF computing approach, we
have implemented a proof-of-concept for OTF computing that can run
typical scenarios of an OTF market. We illustrated the approach using
a cutting-edge application scenario – automated machine learning (AutoML).
Finally, we have been pushing our work for the perpetuation of
On-The-Fly Computing beyond the SFB and sharing the expertise gained
in the SFB in events with industry partners as well as transfer projects.
This work required a broad spectrum of expertise. Computer scientists
and economists with research interests such as computer networks and
distributed algorithms, security and cryptography, software engineering
and verification, configuration and machine learning, computer engineering
and HPC, microeconomics and game theory, business informatics
and management have successfully collaborated here.
AU  - Haake, Claus-Jochen
AU  - Meyer auf der Heide, Friedhelm
AU  - Platzner, Marco
AU  - Wachsmuth, Henning
AU  - Wehrheim, Heike
ID  - 45863
TI  - On-The-Fly Computing -- Individualized IT-services in dynamic markets
VL  - 412
ER  - 
TY  - THES
AB  - Reading between the lines has so far been reserved for humans. The present dissertation addresses this research gap using machine learning methods.
Implicit expressions are not comprehensible by computers and cannot be localized in the text. However, many texts arise on interpersonal topics that, unlike commercial evaluation texts, often imply information only by means of longer phrases. Examples are the kindness and the attentiveness of a doctor, which are only paraphrased (“he didn’t even look me in the eye”). The analysis of such data, especially the identification and localization of implicit statements, is a research gap (1). This work uses so-called Aspect-based Sentiment Analysis as a method for this purpose. It remains open how the aspect categories to be extracted can be discovered and thematically delineated based on the data (2). Furthermore, it is not yet explored how a collection of tools should look like, with which implicit phrases can be identified and thus made explicit
(3). Last, it is an open question how to correlate the identified phrases from the text data with other data, including the investigation of the relationship between quantitative scores (e.g., school grades) and the thematically related text (4). Based on these research gaps, the research question is posed as follows: Using text mining methods, how can implicit rating content be properly interpreted and thus made explicit before it is automatically categorized and quantified?
The uniqueness of this dissertation is based on the automated recognition of implicit linguistic statements alongside explicit statements. These are identified in unstructured text data so that features expressed only in the text can later be compared across data sources, even though they were not included in rating categories such as stars or school grades. German-language physician ratings from websites in three countries serve as the sample domain. The solution approach consists of data creation, a pipeline for text processing and analyses based on this. In the data creation, aspect classes are identified and delineated across platforms and marked in text data. This results in six datasets with over 70,000 annotated sentences and detailed guidelines. The models that were created based on the training data extract and categorize the aspects. In addition, the sentiment polarity and the evaluation weight, i. e., the importance of each phrase, are determined. The models, which are combined in a pipeline, are used in a prototype in the form of a web application. The analyses built on the pipeline quantify the rating contents by linking the obtained information with further data, thus allowing new insights.
As a result, a toolbox is provided to identify quantifiable rating content and categories using text mining for a sample domain. This is used to evaluate the approach, which in principle can also be adapted to any other domain.
AU  - Kersting, Joschka
ID  - 44323
TI  - Identifizierung quantifizierbarer Bewertungsinhalte und -kategorien mittels Text Mining
ER  - 
TY  - CHAP
AU  - Bäumer, Frederik Simon
AU  - Chen, Wei-Fan
AU  - Geierhos, Michaela
AU  - Kersting, Joschka
AU  - Wachsmuth, Henning
ED  - Haake, Claus-Jochen
ED  - Meyer auf der Heide, Friedhelm
ED  - Platzner, Marco
ED  - Wachsmuth, Henning
ED  - Wehrheim, Heike
ID  - 45882
T2  - On-The-Fly Computing -- Individualized IT-services in dynamic markets
TI  - Dialogue-based Requirement Compensation and Style-adjusted Data-to-text Generation
VL  - 412
ER  - 
TY  - CHAP
AB  - We present a concept for quantifying evaluative phrases to later compare rating texts numerically instead of just relying on stars or grades. We achievethis by combining deep learning models in an aspect-based sentiment analysis pipeline along with sentiment weighting, polarity, and correlation analyses that combine deep learning results with metadata. The results provide new insights for the medical field. Our application domain, physician reviews, shows that there are millions of review texts on the Internet that cannot yet be comprehensively analyzed because previous studies have focused on explicit aspects from other domains (e.g., products). We identify, extract, and classify implicit and explicit aspect phrases equally from German-language review texts. To do so, we annotated aspect phrases representing reviews on numerous aspects of a physician, medical practice, or practice staff. We apply the best performing transformer model, XLM-RoBERTa, to a large physician review dataset and correlate the results with existing metadata. As a result, we can show different correlations between the sentiment polarity of certain aspect classes (e.g., friendliness, practice equipment) and physicians’ professions (e.g., surgeon, ophthalmologist). As a result, we have individual numerical scores that contain a variety of information based on deep learning algorithms that extract textual (evaluative) information and metadata from the Web.
AU  - Kersting, Joschka
AU  - Geierhos, Michaela
ED  - Cuzzocrea, Alfredo
ED  - Gusikhin, Oleg
ED  - Hammoudi, Slimane
ED  - Quix, Christoph
ID  - 46205
SN  - 1865-0929
T2  - Data Management Technologies and Applications
TI  - Towards Comparable Ratings: Quantifying Evaluative Phrases in Physician Reviews
VL  - 1860
ER  - 
TY  - CONF
AU  - Chen, Wei-Fan
AU  - Chen, Mei-Hua
AU  - Mudgal, Garima
AU  - Wachsmuth, Henning
ID  - 33274
T2  - Proceedings of the 9th Workshop on Argument Mining (ArgMining 2022)
TI  - Analyzing Culture-Specific Argument Structures in Learner Essays
ER  - 
TY  - CHAP
AB  - This work addresses the automatic resolution of software requirements. In the vision of On-The-Fly Computing, software services should be composed on demand, based solely on natural language input from human users. To enable this, we build a chatbot solution that works with human-in-the-loop support to receive, analyze, correct, and complete their software requirements. The chatbot is equipped with a natural language processing pipeline and a large knowledge base, as well as sophisticated dialogue management skills to enhance the user experience. Previous solutions have focused on analyzing software requirements to point out errors such as vagueness, ambiguity, or incompleteness. Our work shows how apps can collaborate with users to efficiently produce correct requirements. We developed and compared three different chatbot apps that can work with built-in knowledge. We rely on ChatterBot, DialoGPT and Rasa for this purpose. While DialoGPT provides its own knowledge base, Rasa is the best system to combine the text mining and knowledge solutions at our disposal. The evaluation shows that users accept 73% of the suggested answers from Rasa, while they accept only 63% from DialoGPT or even 36% from ChatterBot.
AU  - Kersting, Joschka
AU  - Ahmed, Mobeen
AU  - Geierhos, Michaela
ED  - Stephanidis, Constantine
ED  - Antona, Margherita
ED  - Ntoa, Stavroula
ID  - 32179
KW  - On-The-Fly Computing
KW  - Chatbot
KW  - Knowledge Base
SN  - 1865-0929
T2  - HCI International 2022 Posters
TI  - Chatbot-Enhanced Requirements Resolution for Automated Service Compositions
VL  - 1580
ER  - 
TY  - CONF
AB  - This paper aims at discussing past limitations set in sentiment analysis research regarding explicit and implicit mentions of opinions. Previous studies have regularly neglected this question in favor of methodical research on standard-datasets. Furthermore, they were limited to linguistically less-diverse domains, such as commercial product reviews. We face this issue by annotating a German-language physician review dataset that contains numerous implicit, long, and complex statements that indicate aspect ratings, such as the physician’s friendliness. We discuss the nature of implicit statements and present various samples to illustrate the challenge described.
AU  - Kersting, Joschka
AU  - Bäumer, Frederik Simon
ED  - Kersting, Joschka
ID  - 31054
KW  - Sentiment analysis
KW  - Natural language processing
KW  - Aspect phrase extraction
T2  - Proceedings of the Fourteenth International Conference on Pervasive Patterns and Applications (PATTERNS 2022): Special Track AI-DRSWA: Maturing Artificial Intelligence - Data Science for Real-World Applications
TI  - Implicit Statements in Healthcare Reviews: A Challenge for Sentiment Analysis
ER  - 
TY  - GEN
AU  - Chen, Mei-Hua
AU  - Mudgal, Garima
AU  - Chen, Wei-Fan
AU  - Wachsmuth, Henning
ID  - 31068
T2  - EUROCALL
TI  - Investigating the argumentation structures of EFL learners from diverse language backgrounds
ER  - 
TY  - GEN
ED  - Kersting, Joschka
ID  - 53803
TI  - PATTERNS 2022 The Fourteenth International Conferences on Pervasive Patterns and Applications
ER  - 
TY  - GEN
AB  - This thesis aims to provide a bidirectional chatbot solution for the requirement engineering process. The Sonderforschungsbereich (SFB) 901 intends to provide the composition of software service On-the-Fly (OTF). The sub-project (B1) of the SFB 901 project deals with the parameters of service configuration. OTF Computing aims to eradicate the dependency on the requirement engineers for the software development process. However, there is no existing bidirectional chatbot solution that analyses user software requirements and provides viable suggestions to the user regarding their service. Previously, CORDULA chatbot was developed to analyze the software requirements but cannot keep the conversation’s context. The Rasa framework is integrated with the knowledge base to solve the issue, the knowledge base provides domain-specific knowledge to the chatbot. The software description is passed through the natural language understanding process to give consciousness to the chatbot. This process involves various machine learning models, including app family classification, to correctly identify the domain for user OTF service. The statistical models like naïve Bayes, kNN and SVM are compared with transformer models for this classification task. Furthermore, the entities (functional requirements) are also separated from the user description.
The chatbot provides the suggestion of requirements from the preliminary service template with the support of the knowledge base. Furthermore, the generated response is compared with the state-of-the-art DialoGPT transformer model and ChatterBot conversational library. These models are trained over the software development related conversational dataset. All the responses are ranked using the DialoRPT model, and the BLEU score to evaluates the models’ responses. Moreover, the chatbot mod- els are tested with human participants, they used and scored the chatbot responses based on effectiveness, efficiency and satisfaction. The overall response accuracy is also measured by averaging the user approval over the generated responses.
AU  - Ahmed, Mobeen
ID  - 29000
TI  - Knowledge Base Enhanced & User-centric Dialogue Design for OTF Computing
ER  - 
TY  - GEN
AU  - Palushi, Juela
ID  - 45790
TI  - Domain-aware Text Professionalization using Sequence-to-Sequence Neural Networks
ER  - 
TY  - GEN
AU  - Budanurmath, Vinaykumar
ID  - 45789
TI  - Propaganda Technique Detection Using Connotation Frames
ER  - 
TY  - CONF
AB  - Content is the new oil. Users consume billions of terabytes a day while surfing on news sites or blogs, posting on social media sites, and sending chat messages around the globe. While content is heterogeneous, the dominant form of web content is text. There are situations where more diversity needs to be introduced into text content, for example, to reuse it on websites or to allow a chatbot to base its models on the information conveyed rather than of the language used. In order to achieve this, paraphrasing techniques have been developed: One example is Text spinning, a technique that automatically paraphrases text while leaving the intent intact. This makes it easier to reuse content, or to change the language generated by the bot more human. One method for modifying texts is a combination of translation and back-translation. This paper presents NATTS, a naive approach that uses transformer-based translation models to create diversified text, combining translation steps in one model. An advantage of this approach is that it can be fine-tuned and handle technical language.
AU  - Bäumer, Frederik Simon
AU  - Kersting, Joschka
AU  - Denisov, Sergej
AU  - Geierhos, Michaela
ID  - 26049
KW  - Software Requirements
KW  - Natural Language Processing
KW  - Transfer Learning
KW  - On-The-Fly Computing
T2  - PROCEEDINGS OF THE INTERNATIONAL CONFERENCES ON WWW/INTERNET 2021 AND APPLIED COMPUTING 2021
TI  - IN OTHER WORDS: A NAIVE APPROACH TO TEXT SPINNING
ER  - 
TY  - CHAP
AB  - This chapter concentrates on aspect-based sentiment analysis, a form of opinion mining where algorithms detect sentiments expressed about features of products, services, etc. We especially focus on novel approaches for aspect phrase extraction and classification trained on feature-rich datasets. Here, we present two new datasets, which we gathered from the linguistically rich domain of physician reviews, as other investigations have mainly concentrated on commercial reviews and social media reviews so far. To give readers a better understanding of the underlying datasets, we describe the annotation process and inter-annotator agreement in detail. In our research, we automatically assess implicit mentions or indications of specific aspects. To do this, we propose and utilize neural network models that perform the here-defined aspect phrase extraction and classification task, achieving F1-score values of about 80% and accuracy values of more than 90%. As we apply our models to a comparatively complex domain, we obtain promising results. 
AU  - Kersting, Joschka
AU  - Geierhos, Michaela
ED  - Loukanova, Roussanka
ID  - 17905
T2  - Natural Language Processing in Artificial Intelligence -- NLPinAI 2020
TI  - Towards Aspect Extraction and Classification for Opinion Mining with Deep Sequence Networks
VL  - 939
ER  - 
TY  - CONF
AU  - Kersting, Joschka
AU  - Geierhos, Michaela
ID  - 22051
T2  - Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021)
TI  - Well-being in Plastic Surgery: Deep Learning Reveals Patients' Evaluations
ER  - 
TY  - CHAP
AB  - In this study, we describe a text processing pipeline that transforms user-generated text into structured data. To do this, we train neural and transformer-based models for aspect-based sentiment analysis. As most research deals with explicit aspects from product or service data, we extract and classify implicit and explicit aspect phrases from German-language physician review texts. Patients often rate on the basis of perceived friendliness or competence. The vocabulary is difficult, the topic sensitive, and the data user-generated. The aspect phrases come with various wordings using insertions and are not noun-based, which makes the presented case equally relevant and reality-based. To find complex, indirect aspect phrases, up-to-date deep learning approaches must be combined with supervised training data. We describe three aspect phrase datasets, one of them new, as well as a newly annotated aspect polarity dataset. Alongside this, we build an algorithm to rate the aspect phrase importance. All in all, we train eight transformers on the new raw data domain, compare 54 neural aspect extraction models and, based on this, create eight aspect polarity models for our pipeline. These models are evaluated by using Precision, Recall, and F-Score measures. Finally, we evaluate our aspect phrase importance measure algorithm.
AU  - Kersting, Joschka
AU  - Geierhos, Michaela
ED  - Kapetanios, Epaminondas
ED  - Horacek, Helmut
ED  - Métais, Elisabeth
ED  - Meziane, Farid
ID  - 22052
T2  - Natural Language Processing and Information Systems
TI  - Human Language Comprehension in Aspect Phrase Extraction with Importance Weighting
VL  - 12801
ER  - 
TY  - CONF
AU  - Chen, Wei-Fan
AU  - Al Khatib, Khalid
AU  - Stein, Benno
AU  - Wachsmuth, Henning
ID  - 23709
T2  - Findings of the Association for Computational Linguistics: EMNLP 2021
TI  - Controlled Neural Sentence-Level Reframing of News Articles
ER  - 
TY  - CONF
AU  - Alshomary, Milad
AU  - Syed, Shahbaz
AU  - Potthast, Martin
AU  - Wachsmuth, Henning
ID  - 22229
T2  - Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)
TI  - Argument Undermining: Counter-Argument Generation by Attacking Weak Premises
ER  - 
TY  - GEN
AU  - Bülling, Jonas
ID  - 45788
TI  - Political Speaker Transfer: Learning to Generate Text in the Styles of Barack Obama and Donald Trump
ER  -