TY - CONF AB - Many applications require explainable node classification in knowledge graphs. Towards this end, a popular ``white-box'' approach is class expression learning: Given sets of positive and negative nodes, class expressions in description logics are learned that separate positive from negative nodes. Most existing approaches are search-based approaches generating many candidate class expressions and selecting the best one. However, they often take a long time to find suitable class expressions. In this paper, we cast class expression learning as a translation problem and propose a new family of class expression learning approaches which we dub neural class expression synthesizers. Training examples are ``translated'' into class expressions in a fashion akin to machine translation. Consequently, our synthesizers are not subject to the runtime limitations of search-based approaches. We study three instances of this novel family of approaches based on LSTMs, GRUs, and set transformers, respectively. An evaluation of our approach on four benchmark datasets suggests that it can effectively synthesize high-quality class expressions with respect to the input examples in approximately one second on average. Moreover, a comparison to state-of-the-art approaches suggests that we achieve better F-measures on large datasets. For reproducibility purposes, we provide our implementation as well as pretrained models in our public GitHub repository at https://github.com/dice-group/NeuralClassExpressionSynthesis AU - KOUAGOU, N'Dah Jean AU - Heindorf, Stefan AU - Demir, Caglar AU - Ngonga Ngomo, Axel-Cyrille ED - Pesquita, Catia ED - Jimenez-Ruiz, Ernesto ED - McCusker, Jamie ED - Faria, Daniel ED - Dragoni, Mauro ED - Dimou, Anastasia ED - Troncy, Raphael ED - Hertling, Sven ID - 33734 KW - Neural network KW - Concept learning KW - Description logics T2 - The Semantic Web - 20th Extended Semantic Web Conference (ESWC 2023) TI - Neural Class Expression Synthesis VL - 13870 ER - TY - GEN AB - Knowledge bases are widely used for information management on the web, enabling high-impact applications such as web search, question answering, and natural language processing. They also serve as the backbone for automatic decision systems, e.g. for medical diagnostics and credit scoring. As stakeholders affected by these decisions would like to understand their situation and verify fair decisions, a number of explanation approaches have been proposed using concepts in description logics. However, the learned concepts can become long and difficult to fathom for non-experts, even when verbalized. Moreover, long concepts do not immediately provide a clear path of action to change one's situation. Counterfactuals answering the question "How must feature values be changed to obtain a different classification?" have been proposed as short, human-friendly explanations for tabular data. In this paper, we transfer the notion of counterfactuals to description logics and propose the first algorithm for generating counterfactual explanations in the description logic $\mathcal{ELH}$. Counterfactual candidates are generated from concepts and the candidates with fewest feature changes are selected as counterfactuals. In case of multiple counterfactuals, we rank them according to the likeliness of their feature combinations. For evaluation, we conduct a user survey to investigate which of the generated counterfactual candidates are preferred for explanation by participants. In a second study, we explore possible use cases for counterfactual explanations. AU - Sieger, Leonie Nora AU - Heindorf, Stefan AU - Blübaum, Lukas AU - Ngonga Ngomo, Axel-Cyrille ID - 37937 T2 - arXiv:2301.05109 TI - Counterfactual Explanations for Concepts in ELH ER - TY - CONF AU - Baci, Alkid AU - Heindorf, Stefan ID - 46575 T2 - CIKM TI - Accelerating Concept Learning via Sampling ER - TY - CHAP AB - Class expression learning in description logics has long been regarded as an iterative search problem in an infinite conceptual space. Each iteration of the search process invokes a reasoner and a heuristic function. The reasoner finds the instances of the current expression, and the heuristic function computes the information gain and decides on the next step to be taken. As the size of the background knowledge base grows, search-based approaches for class expression learning become prohibitively slow. Current neural class expression synthesis (NCES) approaches investigate the use of neural networks for class expression learning in the attributive language with complement (ALC). While they show significant improvements over search-based approaches in runtime and quality of the computed solutions, they rely on the availability of pretrained embeddings for the input knowledge base. Moreover, they are not applicable to ontologies in more expressive description logics. In this paper, we propose a novel NCES approach which extends the state of the art to the description logic ALCHIQ(D). Our extension, dubbed NCES2, comes with an improved training data generator and does not require pretrained embeddings for the input knowledge base as both the embedding model and the class expression synthesizer are trained jointly. Empirical results on benchmark datasets suggest that our approach inherits the scalability capability of current NCES instances with the additional advantage that it supports more complex learning problems. NCES2 achieves the highest performance overall when compared to search-based approaches and to its predecessor NCES. We provide our source code, datasets, and pretrained models at https://github.com/dice-group/NCES2. AU - Kouagou, N'Dah Jean AU - Heindorf, Stefan AU - Demir, Caglar AU - Ngonga Ngomo, Axel-Cyrille ID - 47421 SN - 0302-9743 T2 - Machine Learning and Knowledge Discovery in Databases: Research Track TI - Neural Class Expression Synthesis in ALCHIQ(D) ER - TY - CHAP AU - Ngonga Ngomo, Axel-Cyrille AU - Demir, Caglar AU - Kouagou, N'Dah Jean AU - Heindorf, Stefan AU - Karalis, Nikoloas AU - Bigerl, Alexander ID - 46460 T2 - Compendium of Neurosymbolic Artificial Intelligence TI - Class Expression Learning with Multiple Representations ER - TY - JOUR AU - Demir, Caglar AU - Wiebesiek, Michel AU - Lu, Renzhong AU - Ngonga Ngomo, Axel-Cyrille AU - Heindorf, Stefan ID - 46248 JF - ECML PKDD TI - LitCQD: Multi-Hop Reasoning in Incomplete Knowledge Graphs with Numeric Literals ER - TY - CHAP AU - KOUAGOU, N'Dah Jean AU - Heindorf, Stefan AU - Demir, Caglar AU - Ngonga Ngomo, Axel-Cyrille ID - 33740 SN - 0302-9743 T2 - The Semantic Web TI - Learning Concept Lengths Accelerates Concept Learning in ALC ER - TY - CONF AB - Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples (nodes in the knowledge graph), we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties. AU - Heindorf, Stefan AU - Blübaum, Lukas AU - Düsterhus, Nick AU - Werner, Till AU - Golani, Varun Nandkumar AU - Demir, Caglar AU - Ngonga Ngomo, Axel-Cyrille ID - 29290 T2 - WWW TI - EvoLearner: Learning Description Logics with Evolutionary Algorithms ER - TY - CONF AB - Smart home systems contain plenty of features that enhance wellbeing in everyday life through artificial intelligence (AI). However, many users feel insecure because they do not understand the AI’s functionality and do not feel they are in control of it. Combining technical, psychological and philosophical views on AI, we rethink smart homes as interactive systems where users can partake in an intelligent agent’s learning. Parallel to the goals of explainable AI (XAI), we explored the possibility of user involvement in supervised learning of the smart home to have a first approach to improve acceptance, support subjective understanding and increase perceived control. In this work, we conducted two studies: In an online pre-study, we asked participants about their attitude towards teaching AI via a questionnaire. In the main study, we performed a Wizard of Oz laboratory experiment with human participants, where participants spent time in a prototypical smart home and taught activity recognition to the intelligent agent through supervised learning based on the user’s behaviour. We found that involvement in the AI’s learning phase enhanced the users’ feeling of control, perceived understanding and perceived usefulness of AI in general. The participants reported positive attitudes towards training a smart home AI and found the process understandable and controllable. We suggest that involving the user in the learning phase could lead to better personalisation and increased understanding and control by users of intelligent agents for smart home automation. AU - Sieger, Leonie Nora AU - Hermann, Julia AU - Schomäcker, Astrid AU - Heindorf, Stefan AU - Meske, Christian AU - Hey, Celine-Chiara AU - Doğangün, Ayşegül ID - 34674 KW - human-agent interaction KW - smart homes KW - supervised learning KW - participation T2 - International Conference on Human-Agent Interaction TI - User Involvement in Training Smart Home Agents ER - TY - CHAP AU - Zahera, Hamada Mohamed Abdelsamee AU - Heindorf, Stefan AU - Balke, Stefan AU - Haupt, Jonas AU - Voigt, Martin AU - Walter, Carolin AU - Witter, Fabian AU - Ngonga Ngomo, Axel-Cyrille ID - 33738 SN - 0302-9743 T2 - The Semantic Web: ESWC 2022 Satellite Events TI - Tab2Onto: Unsupervised Semantification with Knowledge Graph Embeddings ER - TY - CONF AB - At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typology derived from a data-driven, manual analysis of questions from ten large question answering (QA) datasets. Using high-precision lexical rules, we extract causal questions of each type from these datasets to create our corpus. As an initial baseline, the state-of-the-art QA model UnifiedQA achieves a ROUGE-L F1 score of 0.48 on our new benchmark. AU - Bondarenko, Alexander AU - Wolska, Magdalena AU - Heindorf, Stefan AU - Blübaum, Lukas AU - Ngonga Ngomo, Axel-Cyrille AU - Stein, Benno AU - Braslavski, Pavel AU - Hagen, Matthias AU - Potthast, Martin ID - 33739 T2 - Proceedings of the 29th International Conference on Computational Linguistics TI - CausalQA: A Benchmark for Causal Question Answering ER - TY - JOUR AU - Pestryakova, Svetlana AU - Vollmers, Daniel AU - Sherif, Mohamed AU - Heindorf, Stefan AU - Saleem, Muhammad AU - Moussallem, Diego AU - Ngonga Ngomo, Axel-Cyrille ID - 29851 JF - Scientific Data TI - CovidPubGraph: A FAIR Knowledge Graph of COVID-19 Publications ER - TY - CONF AB - Manufacturing companies are challenged to make the increasingly complex work processes equally manageable for all employees to prevent an impending loss of competence. In this contribution, an intelligent assistance system is proposed enabling employees to help themselves in the workplace and provide them with competence-related support. This results in increasing the short- and long-term efficiency of problem solving in companies. AU - Deppe, Sahar AU - Brandt, Lukas AU - Brünninghaus, Marc AU - Papenkordt, Jörg AU - Heindorf, Stefan AU - Tschirner-Vinke, Gudrun ID - 33957 KW - Assistance system KW - Knowledge graph KW - Information retrieval KW - Neural networks KW - AR TI - AI-Based Assistance System for Manufacturing ER - TY - CONF AU - Zahera, Hamada Mohamed Abdelsamee AU - Heindorf, Stefan AU - Ngonga Ngomo, Axel-Cyrille ID - 29291 T2 - Proceedings of the 11th on Knowledge Capture Conference TI - ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs ER - TY - CHAP AU - Feldhans, Robert AU - Wilke, Adrian AU - Heindorf, Stefan AU - Shaker, Mohammad Hossein AU - Hammer, Barbara AU - Ngonga Ngomo, Axel-Cyrille AU - Hüllermeier, Eyke ID - 29292 SN - 0302-9743 T2 - Intelligent Data Engineering and Automated Learning – IDEAL 2021 TI - Drift Detection in Text Data with Document Embeddings ER - TY - CONF AB - Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, $\mathbb{R}$ and $\mathbb{C}$. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield state-of-the-art results for link prediction. In this paper, we investigate a composition of convolution operations with hypercomplex multiplications. We propose the four approaches QMult, OMult, ConvQ and ConvO to tackle the link prediction problem. QMult and OMult can be considered as quaternion and octonion extensions of previous state-of-the-art approaches, including DistMult and ComplEx. ConvQ and ConvO build upon QMult and OMult by including convolution operations in a way inspired by the residual learning framework. We evaluated our approaches on seven link prediction datasets including WN18RR, FB15K-237 and YAGO3-10. Experimental results suggest that the benefits of learning hypercomplex-valued vector representations become more apparent as the size and complexity of the knowledge graph grows. ConvO outperforms state-of-the-art approaches on FB15K-237 in MRR, Hit@1 and Hit@3, while QMult, OMult, ConvQ and ConvO outperform state-of-the-approaches on YAGO3-10 in all metrics. Results also suggest that link prediction performances can be further improved via prediction averaging. To foster reproducible research, we provide an open-source implementation of approaches, including training and evaluation scripts as well as pretrained models. AU - Demir, Caglar AU - Moussallem, Diego AU - Heindorf, Stefan AU - Ngonga Ngomo, Axel-Cyrille ID - 29287 T2 - The 13th Asian Conference on Machine Learning, ACML 2021 TI - Convolutional Hypercomplex Embeddings for Link Prediction ER - TY - CONF AU - Nickchen, Tobias AU - Heindorf, Stefan AU - Engels, Gregor ID - 29294 T2 - 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) TI - Generating Physically Sound Training Data for Image Recognition of Additively Manufactured Parts ER - TY - GEN AU - Heindorf, Stefan ID - 33733 TI - Automatically generating instructions from tutorials for search and user navigation ER - TY - CONF AU - Heindorf, Stefan AU - Scholten, Yan AU - Wachsmuth, Henning AU - Ngonga Ngomo, Axel-Cyrille AU - Potthast, Martin ID - 20141 T2 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2020) TI - CauseNet: Towards a Causality Graph Extracted from the Web ER - TY - CONF AU - Heindorf, Stefan AU - Scholten, Yan AU - Engels, Gregor AU - Potthast, Martin ID - 7668 T2 - WWW TI - Debiasing Vandalism Detection Models at Wikidata ER - TY - THES AU - Heindorf, Stefan ID - 15333 TI - Vandalism Detection in Crowdsourced Knowledge Bases ER - TY - CONF AU - Heindorf, Stefan AU - Scholten, Yan AU - Engels, Gregor AU - Potthast, Martin ID - 14568 T2 - INFORMATIK TI - Debiasing Vandalism Detection Models at Wikidata (Extended Abstract) ER - TY - CONF AB - Many websites offer links to social media sites for convenient content sharing. Unfortunately, those sharing capabilities are quite restricted and it is seldom possible to share content with other services, like those provided by a user's favorite applications or smart devices. In this paper, we present Semantic Data Mediator (SDM) --- a flexible middleware linking a vast number of services to millions of websites. Based on reusable repositories of service descriptions defined by the crowd, users can easily fill a personal registry with their favorite services, which can then be linked to websites by SDM. For this, SDM leverages semantic data, which is already available on millions of websites due to search engine optimization. Further support for our approach from website or service developers is not required. To enable the use of a broad range of services, data conversion services are automatically composed by SDM to transform data according to the needs of the different services. In addition to linking web services, various service adapters allow services of applications and smart devices to be linked as well. We have fully implemented our approach and present a real-world case study demonstrating its feasibility and usefulness. AU - Wolters, Dennis AU - Heindorf, Stefan AU - Kirchhoff, Jonas AU - Engels, Gregor ED - Braubach, Lars ED - Murillo, Juan M. ED - Kaviani, Nima ED - Lama, Manuel ED - Burgueño, Loli ED - Moha, Naouel ED - Oriol, Marc ID - 5831 SN - 978-3-319-91764-1 T2 - Service-Oriented Computing -- ICSOC 2017 Workshops TI - Semantic Data Mediator: Linking Services to Websites ER - TY - CONF AU - Heindorf, Stefan AU - Potthast, Martin AU - Bast, Hannah AU - Buchhold, Björn AU - Haussmann, Elmar ID - 6721 T2 - WSDM TI - WSDM Cup 2017: Vandalism Detection and Triple Scoring ER - TY - CONF AB - Websites increasingly embed semantic data for search engine optimization. The most common ontology for semantic data, schema.org, is supported by all major search engines and describes over 500 data types, including calendar events, recipes, products, and TV shows. As of today, users wishing to pass this data to their favorite applications, e.g., their calendars, cookbooks, price comparison applications or even smart devices such as TV receivers, rely on cumbersome and error-prone workarounds such as reentering the data or a series of copy and paste operations. In this paper, we present Semantic Data Mediator (SDM), an approach that allows the easy transfer of semantic data to a multitude of services, ranging from web services to applications installed on different devices. SDM extracts semantic data from the currently displayed web page on the client-side, offers suitable services to the user, and by the press of a button, forwards this data to the desired service while doing all the necessary data conversion and service interface adaptation in between. To realize this, we built a reusable repository of service descriptions, data converters, and service adapters, which can be extended by the crowd. Our approach for linking services to websites relies solely on semantic data and does not require any additional support by either website or service developers. We have fully implemented our approach and present a real-world case study demonstrating its feasibility and usefulness. AU - Wolters, Dennis AU - Heindorf, Stefan AU - Kirchhoff, Jonas AU - Engels, Gregor ED - Altintas, Ilkay ED - Chen, Shiping ID - 5829 KW - Services KW - Websites KW - Semantic Data KW - schema.org KW - Data Conversion KW - Interface Adaptation KW - Mediation SN - 9781538607527 T2 - 2017 IEEE International Conference on Web Services (ICWS) TI - Linking Services to Websites by Leveraging Semantic Data ER - TY - CONF AB - We report on the Wikidata vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata vandalism detection as an online learning problem, requiring participant software to predict vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release. AU - Heindorf, Stefan AU - Potthast, Martin AU - Engels, Gregor AU - Stein, Benno ID - 6722 T2 - WSDM Cup 2017 Notebook Papers TI - Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017 ER - TY - GEN AB - The WSDM Cup 2017 was a data mining challenge held in conjunction with the 10th International Conference on Web Search and Data Mining (WSDM). It addressed key challenges of knowledge bases today: quality assurance and entity search. For quality assurance, we tackle the task of vandalism detection, based on a dataset of more than 82 million user-contributed revisions of the Wikidata knowledge base, all of which annotated with regard to whether or not they are vandalism. For entity search, we tackle the task of triple scoring, using a dataset that comprises relevance scores for triples from type-like relations including occupation and country of citizenship, based on about 10,000 human relevance judgements. For reproducibility sake, participants were asked to submit their software on TIRA, a cloud-based evaluation platform, and they were incentivized to share their approaches open source. AU - Potthast, Martin AU - Heindorf, Stefan AU - Bast, Hannah ID - 33732 T2 - arXiv:1712.09528 TI - Proceedings of the WSDM Cup 2017: Vandalism Detection and Triple Scoring ER - TY - CONF AB - Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity.Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata.We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROC-AUC, of 0.991. It significantly outperforms the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROC-AUC) and a prototypical vandalism detector recently introduced by Wikimedia within the Objective Revision Evaluation Service (0.859 ROC-AUC). AU - Heindorf, Stefan AU - Potthast, Matthias AU - Stein, Benno AU - Engels, Gregor ID - 137 T2 - Proceedings of the 25th International Conference on Information and Knowledge Management (CIKM 2016) TI - Vandalism Detection in Wikidata ER - TY - CONF AU - Heindorf, Stefan AU - Potthast, Martin AU - Stein, Benno AU - Engels, Gregor ID - 6719 SN - 9781450336215 T2 - SIGIR TI - Towards Vandalism Detection in Knowledge Bases ER - TY - CONF AU - Böttcher, Stefan AU - Hartel, Rita AU - Heindorf, Stefan ID - 6720 T2 - ADC TI - Optimized XPath evaluation for Schema-compressed XML data VL - 124 ER -