TY - JOUR AU - Gries, Thomas AU - Fritz, Marlon AU - Yuanhua, Feng ID - 6734 IS - 1 JF - Oxford Bulletin of Economics and Statistics TI - Growth Trends and Systematic Patterns of Boom and Busts –Testing 200 Years of Business Cycle Dynamics VL - 81 ER - TY - CONF AU - Müller, Michelle AU - Gutt, Dominik ID - 6856 T2 - Wirtschaftsinformatik Proceedings 2019 TI - Heart over Heels? An Empirical Analysis of the Relationship between Emotions and Review Helpfulness for Experience and Credence Goods ER - TY - CONF AU - Poniatowski, Martin AU - Neumann, Jürgen AU - Görzen, Thomas AU - Kundisch, Dennis ID - 6857 T2 - Wirtschaftsinformatik Proceedings 2019 TI - A Semi-Automated Approach for Generating Online Review Templates, ER - TY - CONF AU - Afifi, Haitham AU - Karl, Holger ID - 6860 T2 - 2019 16th IEEE Annual Consumer Communications & Networking Conference (CCNC2019) TI - Power Allocation with a Wireless Multi-cast Aware Routing for Virtual Network Embedding ER - TY - CONF AB - We investigate the maintenance of overlay networks under massive churn, i.e. nodes joining and leaving the network. We assume an adversary that may churn a constant fraction $\alpha n$ of nodes over the course of $\mathcal{O}(\log n)$ rounds. In particular, the adversary has an almost up-to-date information of the network topology as it can observe an only slightly outdated topology that is at least $2$ rounds old. Other than that, we only have the provably minimal restriction that new nodes can only join the network via nodes that have taken part in the network for at least one round. Our contributions are as follows: First, we show that it is impossible to maintain a connected topology if adversary has up-to-date information about the nodes' connections. Further, we show that our restriction concerning the join is also necessary. As our main result present an algorithm that constructs a new overlay- completely independent of all previous overlays - every $2$ rounds. Furthermore, each node sends and receives only $\mathcal{O}(\log^3 n)$ messages each round. As part of our solution we propose the Linearized DeBruijn Swarm (LDS), a highly churn resistant overlay, which will be maintained by the algorithm. However, our approaches can be transferred to a variety of classical P2P Topologies where nodes are mapped into the $[0,1)$-interval. AU - Götte, Thorsten AU - Vijayalakshmi, Vipin Ravindran AU - Scheideler, Christian ID - 6976 T2 - Proceedings of the 2019 IEEE 33rd International Parallel and Distributed Processing Symposium (IPDPS '19) TI - Always be Two Steps Ahead of Your Enemy - Maintaining a Routable Overlay under Massive Churn with an Almost Up-to-date Adversary ER - TY - CONF AB - FPGA devices have been proving to be good candidates to accelerate applications from different research topics. For instance, machine learning applications such as K-Means clustering usually relies on large amount of data to be processed, and, despite the performance offered by other architectures, FPGAs can offer better energy efficiency. With that in mind, Intel has launched a platform that integrates a multicore and an FPGA in the same package, enabling low latency and coherent fine-grained data offload. In this paper, we present a parallel implementation of the K-Means clustering algorithm, for this novel platform, using OpenCL language, and compared it against other platforms. We found that the CPU+FPGA platform was more energy efficient than the CPU-only approach from 70.71% to 85.92%, with Standard and Tiny input sizes respectively, and up to 68.21% of performance improvement was obtained with Tiny input size. Furthermore, it was up to 7.2×more energy efficient than an Intel® Xeon Phi ™, 21.5×than a cluster of Raspberry Pi boards, and 3.8×than the low-power MPPA-256 architecture, when the Standard input size was used. AU - Souza, Matheus A. AU - Maciel, Lucas A. AU - Penna, Pedro Henrique AU - Freitas, Henrique C. ID - 16411 KW - pc2-harp-ressources SN - 9781538677698 T2 - 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) TI - Energy Efficient Parallel K-Means Clustering for an Intel® Hybrid Multi-Chip Package ER - TY - CONF AB - In recent years, FPGAs have been successfully employed for the implementation of efficient, application-specific accelerators for a wide range of machine learning tasks. In this work, we consider probabilistic models, namely, (Mixed) Sum-Product Networks (SPN), a deep architecture that can provide tractable inference for multivariate distributions over mixed data-sources. We develop a fully pipelined FPGA accelerator architecture, including a pipelined interface to external memory, for the inference in (mixed) SPNs. To meet the precision constraints of SPNs, all computations are conducted using double-precision floating point arithmetic. Starting from an input description, the custom FPGA-accelerator is synthesized fully automatically by our tool flow. To the best of our knowledge, this work is the first approach to offload the SPN inference problem to FPGA-based accelerators. Our evaluation shows that the SPN inference problem benefits from offloading to our pipelined FPGA accelerator architecture. AU - Sommer, Lukas AU - Oppermann, Julian AU - Molina, Alejandro AU - Binnig, Carsten AU - Kersting, Kristian AU - Koch, Andreas ID - 16413 KW - pc2-harp-ressources SN - 9781538684771 T2 - 2018 IEEE 36th International Conference on Computer Design (ICCD) TI - Automatic Mapping of the Sum-Product Network Inference Problem to FPGA-Based Accelerators ER - TY - GEN AU - Lienen, Julian ID - 16415 TI - Automated Feature Engineering on Time Series Data ER - TY - CONF AB - The performance of High-Level Synthesis (HLS) applications with irregular data structures is limited by its imperative programming paradigm like C/C++. In this paper, we show that constructing concurrent data structures with channels, a programming construct derived from CSP (communicating sequential processes) paradigm, is an effective approach to improve the performance of these applications. We evaluate concurrent data structure for FPGA by synthesizing a K-means clustering algorithm on the Intel HARP2 platform. A fully pipelined KMC processing element can be synthesized from OpenCL with the help of a SPSC (single-producer-single-consumer) queue and stack built from channels, achieving 15.2x speedup over a sequential baseline. The number of processing element can be scaled up by leveraging a MPMC (multiple-producer-multiple-consumer) stack with work distribution for dynamic load balance. Evaluation shows that an additional 3.5x speedup can be achieved when 4 processing element is instantiated. These results show that the concurrent data structure built with channels has great potential for improving the parallelism of HLS applications. We hope that our study will stimulate further research into the potential of channel-based HLS. AU - Yan, Hui AU - Li, Zhaoshi AU - Liu, Leibo AU - Yin, Shouyi AU - Wei, Shaojun ID - 16417 KW - pc2-harp-ressources SN - 9781450361378 T2 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays TI - Constructing Concurrent Data Structures on FPGA with Channels ER - TY - JOUR AB - Field-Programmable Gate Arrays (FPGAs) are widely used in the central signal processing design of the Square Kilometer Array (SKA) as hardware accelerators. The frequency domain acceleration search (FDAS) module is an important part of the SKA1-MID pulsar search engine. To develop for a yet to be finalized hardware, for cross-discipline interoperability and to achieve fast prototyping, OpenCL as a high-level FPGA synthesis approaches employed to create the sub-modules of FDAS. The FT convolution and the harmonic-summing plus some other minor sub-modules are elements in the FDAS module that have been well-optimized separately before. In this paper, we explore the design space of combining well-optimized designs, dealing with the ensuing need to trade-off and compromise. Pipeline computing is employed to handle multiple input arrays at high speed. The hardware target is to employ multiple high-end FPGAs to process the combined FDAS module. The results show interesting consequences, where the best individual solutions are not necessarily the best solutions for the speed of a pipeline where FPGA resources and memory bandwidth need to be shared. By proposing multiple buffering techniques to the pipeline, the combined FDAS module can achieve up to 2[Formula: see text] speedup over implementations without pipeline computing. We perform an extensive experimental evaluation on multiple high-end FPGA cards hosted in a workstation and compare to a technology comparable mid-range GPU. AU - Wang, Haomiao AU - Thiagaraj, Prabu AU - Sinnen, Oliver ID - 16420 JF - Journal of Astronomical Instrumentation KW - pc2-harp-ressources SN - 2251-1717 TI - Combining Multiple Optimized FPGA-based Pulsar Search Modules Using OpenCL ER -