TY - GEN AU - Riebler, Heinrich ID - 521 KW - coldboot TI - Identifikation und Wiederherstellung von kryptographischen Schlüsseln mit FPGAs ER - TY - CONF AB - Cold-boot attacks exploit the fact that DRAM contents are not immediately lost when a PC is powered off. Instead the contents decay rather slowly, in particular if the DRAM chips are cooled to low temperatures. This effect opens an attack vector on cryptographic applications that keep decrypted keys in DRAM. An attacker with access to the target computer can reboot it or remove the RAM modules and quickly copy the RAM contents to non-volatile memory. By exploiting the known cryptographic structure of the cipher and layout of the key data in memory, in our application an AES key schedule with redundancy, the resulting memory image can be searched for sections that could correspond to decayed cryptographic keys; then, the attacker can attempt to reconstruct the original key. However, the runtime of these algorithms grows rapidly with increasing memory image size, error rate and complexity of the bit error model, which limits the practicability of the approach.In this work, we study how the algorithm for key search can be accelerated with custom computing machines. We present an FPGA-based architecture on a Maxeler dataflow computing system that outperforms a software implementation up to 205x, which significantly improves the practicability of cold-attacks against AES. AU - Riebler, Heinrich AU - Kenter, Tobias AU - Sorge, Christoph AU - Plessl, Christian ID - 528 KW - coldboot T2 - Proceedings of the International Conference on Field-Programmable Technology (FPT) TI - FPGA-accelerated Key Search for Cold-Boot Attacks against AES ER - TY - CONF AB - In this paper we introduce “On-The-Fly Computing”, our vision of future IT services that will be provided by assembling modular software components available on world-wide markets. After suitable components have been found, they are automatically integrated, configured and brought to execution in an On-The-Fly Compute Center. We envision that these future compute centers will continue to leverage three current trends in large scale computing which are an increasing amount of parallel processing, a trend to use heterogeneous computing resources, and—in the light of rising energy cost—energy-efficiency as a primary goal in the design and operation of computing systems. In this paper, we point out three research challenges and our current work in these areas. AU - Happe, Markus AU - Kling, Peter AU - Plessl, Christian AU - Platzner, Marco AU - Meyer auf der Heide, Friedhelm ID - 505 T2 - Proceedings of the 9th IEEE Workshop on Software Technology for Future embedded and Ubiquitous Systems (SEUS) TI - On-The-Fly Computing: A Novel Paradigm for Individualized IT Services ER - TY - CONF AU - Suess, Tim AU - Schoenrock, Andrew AU - Meisner, Sebastian AU - Plessl, Christian ID - 1787 SN - 978-0-7695-4979-8 T2 - Proc. Int. Symp. on Parallel and Distributed Processing Workshops (IPDPSW) TI - Parallel Macro Pipelining on the Intel SCC Many-Core Computer ER - TY - CONF AU - Grunzke, Richard AU - Birkenheuer, Georg AU - Blunk, Dirk AU - Breuers, Sebastian AU - Brinkmann, André AU - Gesing, Sandra AU - Herres-Pawlis, Sonja AU - Kohlbacher, Oliver AU - Krüger, Jens AU - Kruse, Martin AU - Müller-Pfefferkorn, Ralph AU - Schäfer, Patrick AU - Schuller, Bernd AU - Steinke, Thomas AU - Zink, Andreas ID - 2107 T2 - Proc. UNICORE Summit TI - A Data Driven Science Gateway for Computational Workflows ER - TY - GEN AU - Plessl, Christian AU - Platzner, Marco AU - Agne, Andreas AU - Happe, Markus AU - Lübbers, Enno ID - 587 TI - Programming models for reconfigurable heterogeneous multi-cores ER - TY - CONF AB - Although the benefits of FPGAs for accelerating scientific codes are widely acknowledged, the use of FPGA accelerators in scientific computing is not widespread because reaping these benefits requires knowledge of hardware design methods and tools that is typically not available with domain scientists. A promising but hardly investigated approach is to develop tool flows that keep the common languages for scientific code (C,C++, and Fortran) and allow the developer to augment the source code with OpenMPlike directives for instructing the compiler which parts of the application shall be offloaded the FPGA accelerator. In this work we study whether the promise of effective FPGA acceleration with an OpenMP-like programming effort can actually be held. Our target system is the Convey HC-1 reconfigurable computer for which an OpenMP-like programming environment exists. As case study we use an application from computational nanophotonics. Our results show that a developer without previous FPGA experience could create an FPGA-accelerated application that is competitive to an optimized OpenMP-parallelized CPU version running on a two socket quad-core server. Finally, we discuss our experiences with this tool flow and the Convey HC-1 from a productivity and economic point of view. AU - Meyer, Björn AU - Schumacher, Jörn AU - Plessl, Christian AU - Förstner, Jens ID - 2106 KW - funding-upb-forschungspreis KW - funding-maxup KW - tet_topic_hpc T2 - Proc. Int. Conf. on Field Programmable Logic and Applications (FPL) TI - Convey Vector Personalities – FPGA Acceleration with an OpenMP-like Effort? ER - TY - JOUR AU - Schumacher, Tobias AU - Plessl, Christian AU - Platzner, Marco ID - 2108 IS - 2 JF - Microprocessors and Microsystems KW - funding-altera SN - 0141-9331 TI - IMORC: An Infrastructure and Architecture Template for Implementing High-Performance Reconfigurable FPGA Accelerators VL - 36 ER - TY - CONF AB - Due to the continuously shrinking device structures and increasing densities of FPGAs, thermal aspects have become the new focus for many research projects over the last years. Most researchers rely on temperature simulations to evaluate their novel thermal management techniques. However, the accuracy of the simulations is to some extent questionable and they require a high computational effort if a detailed thermal model is used.For experimental evaluation of real-world temperature management methods, often synthetic heat sources are employed. Therefore, in this paper we investigated the question if we can create significant rises in temperature on modern FPGAs to enable future evaluation of thermal management techniques based on experiments in contrast to simulations. Therefore, we have developed eight different heat-generating cores that use different subsets of the FPGA resources. Our experimental results show that, according to the built-in thermal diode of our Xilinx Virtex-5 FPGA, we can increase the chip temperature by 134 degree C in less than 12 minutes by only utilizing about 21% of the slices. AU - Happe, Markus AU - Hangmann, Hendrik AU - Agne, Andreas AU - Plessl, Christian ID - 615 T2 - Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig) TI - Eight Ways to put your FPGA on Fire – A Systematic Study of Heat Generators ER - TY - CONF AB - One major obstacle for a wide spread FPGA usage in general-purpose computing is the development tool flow that requires much higher effort than for pure software solutions. Convey Computer promises a solution to this problem for their HC-1 platform, where the FPGAs are configured to run as a vector processor and the software source code can be annotated with pragmas that guide an automated vectorization process. We investigate this approach for a stereo matching algorithm that has abundant parallelism and a number of different computational patterns. We note that for this case study the automated vectorization in its current state doesn’t hold its productivity promise. However, we also show that using the Vector Personality can yield a significant speedups compared to CPU implementations in two of three investigated phases of the algorithm. Those speedups don’t match custom FPGA implementations, but can come with much reduced development effort. AU - Kenter, Tobias AU - Plessl, Christian AU - Schmitz, Henning ID - 591 T2 - Proceedings of the International Conference on ReConFigurable Computing and FPGAs (ReConFig) TI - Pragma based parallelization - Trading hardware efficiency for ease of use? ER - TY - CONF AB - Today's design and operation principles and methods do not scale well with future reconfigurable computing systems due to an increased complexity in system architectures and applications, run-time dynamics and corresponding requirements. Hence, novel design and operation principles and methods are needed that possibly break drastically with the static ones we have built into our systems and the fixed abstraction layers we have cherished over the last decades. Thus, we propose a HW/SW platform that collects and maintains information about its state and progress which enables the system to reason about its behavior (self-awareness) and utilizes its knowledge to effectively and autonomously adapt its behavior to changing requirements (self-expression).To enable self-awareness, our compute nodes collect information using a variety of sensors, i.e. performance counters and thermal diodes, and use internal self-awareness models that process these information. For self-awareness, on-line learning is crucial such that the node learns and continuously updates its models at run-time to react to changing conditions. To enable self-expression, we break with the classic design-time abstraction layers of hardware, operating system and software. In contrast, our system is able to vertically migrate functionalities between the layers at run-time to exploit trade-offs between abstraction and optimization.This paper presents a heterogeneous multi-core architecture, that enables self-awareness and self-expression, an operating system for our proposed hardware/software platform and a novel self-expression method. AU - Happe, Markus AU - Agne, Andreas AU - Plessl, Christian AU - Platzner, Marco ID - 609 T2 - Proceedings of the Workshop on Self-Awareness in Reconfigurable Computing Systems (SRCS) TI - Hardware/Software Platform for Self-aware Compute Nodes ER - TY - CONF AB - Heterogeneous machines are gaining momentum in the High Performance Computing field, due to the theoretical speedups and power consumption. In practice, while some applications meet the performance expectations, heterogeneous architectures still require a tremendous effort from the application developers. This work presents a code generation method to port codes into heterogeneous platforms, based on transformations of the control flow into function calls. The results show that the cost of the function-call mechanism is affordable for the tested HPC kernels. The complete toolchain, based on the LLVM compiler infrastructure, is fully automated once the sequential specification is provided. AU - Barrio, Pablo AU - Carreras, Carlos AU - Sierra, Roberto AU - Kenter, Tobias AU - Plessl, Christian ID - 567 T2 - Proceedings of the International Conference on High Performance Computing and Simulation (HPCS) TI - Turning control flow graphs into function calls: Code generation for heterogeneous architectures ER - TY - CONF AB - While numerous publications have presented ring oscillator designs for temperature measurements a detailed study of the ring oscillator's design space is still missing. In this work, we introduce metrics for comparing the performance and area efficiency of ring oscillators and a methodology for determining these metrics. As a result, we present a systematic study of the design space for ring oscillators for a Xilinx Virtex-5 platform FPGA. AU - Rüthing, Christoph AU - Happe, Markus AU - Agne, Andreas AU - Plessl, Christian ID - 612 T2 - Proceedings of the International Conference on Field Programmable Logic and Applications (FPL) TI - Exploration of Ring Oscillator Design Space for Temperature Measurements on FPGAs ER - TY - CONF AU - Beisel, Tobias AU - Wiersema, Tobias AU - Plessl, Christian AU - Brinkmann, André ID - 2180 KW - funding-enhance T2 - Proc. Workshop on Computer Architecture and Operating System Co-design (CAOS) TI - Programming and Scheduling Model for Supporting Heterogeneous Accelerators in Linux ER - TY - JOUR AU - Grad, Mariusz AU - Plessl, Christian ID - 2177 JF - Int. Journal of Reconfigurable Computing (IJRC) TI - On the Feasibility and Limitations of Just-In-Time Instruction Set Extension for FPGA-based Reconfigurable Processors ER - TY - CONF AU - Kenter, Tobias AU - Plessl, Christian AU - Platzner, Marco AU - Kauschke, Michael ID - 2191 KW - funding-intel T2 - Intel European Research and Innovation Conference TI - Estimation and Partitioning for CPU-Accelerator Architectures ER - TY - CHAP AU - Plessl, Christian AU - Platzner, Marco ED - Khalgui, Mohamed ED - Hanisch, Hans-Michael ID - 2202 SN - 978-1-60960-086-0 T2 - Reconfigurable Embedded Control Systems: Applications for Flexibility and Agility TI - Hardware Virtualization on Dynamically Reconfigurable Embedded Processors ER - TY - CHAP AU - Sekanina, Lukas AU - Walker, James Alfred AU - Kaufmann, Paul AU - Plessl, Christian AU - Platzner, Marco ID - 10737 T2 - Cartesian Genetic Programming TI - Evolution of Electronic Circuits ER - TY - CONF AU - Meyer, Björn AU - Plessl, Christian AU - Förstner, Jens ID - 2194 KW - tet_topic_hpc T2 - Symp. on Application Accelerators in High Performance Computing (SAAHPC) TI - Transformation of scientific algorithms to parallel computing code: subdomain support in a MPI-multi-GPU backend ER - TY - CONF AU - Beisel, Tobias AU - Wiersema, Tobias AU - Plessl, Christian AU - Brinkmann, André ID - 2193 T2 - Proc. Int. Conf. on Application-Specific Systems, Architectures, and Processors (ASAP) TI - Cooperative multitasking for heterogeneous accelerators in the Linux Completely Fair Scheduler ER - TY - CONF AB - In the next decades, hybrid multi-cores will be the predominant architecture for reconfigurable FPGA-based systems. Temperature-aware thread mapping strategies are key for providing dependability in such systems. These strategies rely on measuring the temperature distribution and redicting the thermal behavior of the system when there are changes to the hardware and software running on the FPGA. While there are a number of tools that use thermal models to predict temperature distributions at design time, these tools lack the flexibility to autonomously adjust to changing FPGA configurations. To address this problem we propose a temperature-aware system that empowers FPGA-based reconfigurable multi-cores to autonomously predict the on-chip temperature distribution for pro-active thread remapping. Our system obtains temperature measurements through a self-calibrating grid of sensors and uses area constrained heat-generating circuits in order to generate spatial and temporal temperature gradients. The generated temperature variations are then used to learn the free parameters of the system's thermal model. The system thus acquires an understanding of its own thermal characteristics. We implemented an FPGA system containing a net of 144 temperature sensors on a Xilinx Virtex-6 LX240T FPGA that is aware of its thermal model. Finally, we show that the temperature predictions vary less than 0.72 degree C on average compared to the measured temperature distributions at run-time. AU - Happe, Markus AU - Agne, Andreas AU - Plessl, Christian ID - 656 T2 - Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig) TI - Measuring and Predicting Temperature Distributions on FPGAs at Run-Time ER - TY - CONF AU - Kenter, Tobias AU - Platzner, Marco AU - Plessl, Christian AU - Kauschke, Michael ID - 2200 KW - design space exploration KW - LLVM KW - partitioning KW - performance KW - estimation KW - funding-intel SN - 978-1-4503-0554-9 T2 - Proc. Int. Symp. on Field-Programmable Gate Arrays (FPGA) TI - Performance Estimation Framework for Automated Exploration of CPU-Accelerator Architectures ER - TY - JOUR AU - Schumacher, Tobias AU - Süß, Tim AU - Plessl, Christian AU - Platzner, Marco ID - 2201 JF - Int. Journal of Recon- figurable Computing (IJRC) KW - funding-altera TI - FPGA Acceleration of Communication-bound Streaming Applications: Architecture Modeling and a 3D Image Compositing Case Study ER - TY - CONF AU - Grad, Mariusz AU - Plessl, Christian ID - 2198 T2 - Proc. Reconfigurable Architectures Workshop (RAW) TI - Just-in-time Instruction Set Extension – Feasibility and Limitations for an FPGA-based Reconfigurable ASIP Architecture ER - TY - CONF AU - Lübbers, Enno AU - Platzner, Marco AU - Plessl, Christian AU - Keller, Ariane AU - Plattner, Bernhard ID - 2223 SN - 1-60132-140-6 T2 - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) TI - Towards Adaptive Networking for Embedded Devices based on Reconfigurable Hardware ER - TY - CONF AU - Grad, Mariusz AU - Plessl, Christian ID - 2216 T2 - Proc. Int. Conf. on ReConFigurable Computing and FPGAs (ReConFig) TI - Pruning the Design Space for Just-In-Time Processor Customization ER - TY - CONF AU - Grad, Mariusz AU - Plessl, Christian ID - 2224 SN - 1-60132-140-6 T2 - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) TI - An Open Source Circuit Library with Benchmarking Facilities ER - TY - CONF AU - Andrews, David AU - Plessl, Christian ID - 2220 SN - 1-60132-140-6 T2 - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) TI - Configurable Processor Architectures: History and Trends ER - TY - GEN ED - Plaks, Toomas P. ED - Andrews, David ED - DeMara, Ronald ED - Lam, Herman ED - Lee, Jooheung ED - Plessl, Christian ED - Stitt, Greg ID - 2222 SN - 1-60132-140-6 TI - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) ER - TY - CONF AU - Beisel, Tobias AU - Niekamp, Manuel AU - Plessl, Christian ID - 2226 SN - 978-1-4244-6965-9 T2 - Proc. Int. Conf. on Application-Specific Systems, Architectures, and Processors (ASAP) TI - Using Shared Library Interposing for Transparent Acceleration in Systems with Heterogeneous Hardware Accelerators ER - TY - CONF AU - Keller, Ariane AU - Plattner, Bernhard AU - Lübbers, Enno AU - Platzner, Marco AU - Plessl, Christian ID - 2206 SN - 978-1-4244-8864-3 T2 - Proc. IEEE Globecom Workshop on Network of the Future (FutureNet) TI - Reconfigurable Nodes for Future Networks ER - TY - CONF AU - Woehrle, Matthias AU - Plessl, Christian AU - Thiele, Lothar ID - 2227 SN - 978-1-4244-7911-5 T2 - Proc. Int. Conf. Networked Sensing Systems (INSS) TI - Rupeas: Ruby Powered Event Analysis DSL ER - TY - CONF AU - Kenter, Tobias AU - Platzner, Marco AU - Plessl, Christian AU - Kauschke, Michael ED - Hammami, Omar ED - Larrabee, Sandra ID - 2228 T2 - Proc. Workshop on Architectural Research Prototyping (WARP), International Symposium on Computer Architecture (ISCA) TI - Performance Estimation for the Exploration of CPU-Accelerator Architectures ER - TY - GEN AB - Wireless Sensor Networks (WSNs) are unique embedded computation systems for distributed sensing of a dispersed phenomenon. While being a strongly concurrent distributed system, its embedded aspects with severe resource limitations and the wireless communication requires a fusion of technologies and methodologies from very different fields. As WSNs are deployed in remote locations for long-term unattended operation, assurance of correct functioning of the system is of prime concern. Thus, the design and development of WSNs requires specialized tools to allow for testing and debugging the system. To this end, we present a framework for analyzing and checking WSNs based on collected events during system operation. It allows for abstracting from the event trace by means of behavioral queries and uses assertions for checking the accordance of an execution to its specification. The framework is independent from WSN test platforms, applications and logging semantics and thus generally applicable for analyzing event logs of WSN test executions. AU - Woehrle, Matthias AU - Plessl, Christian AU - Thiele, Lothar ID - 2353 KW - Rupeas KW - DSL KW - WSN KW - testing TI - Rupeas: Ruby Powered Event Analysis DSL ER - TY - CONF AB - Mapping applications that consist of a collection of cores to FPGA accelerators and optimizing their performance is a challenging task in high performance reconfigurable computing. We present IMORC, an architectural template and highly versatile on-chip interconnect. IMORC links provide asynchronous FIFOs and bitwidth conversion which allows for flexibly composing accelerators from cores running at full speed within their own clock domains, thus facilitating the re-use of cores and portability. Further, IMORC inserts performance counters for monitoring runtime data. In this paper, we first introduce the IMORC architectural template and the on-chip interconnect, and then demonstrate IMORC on the example of accelerating the k-th nearest neighbor thinning problem on an XD1000 reconfigurable computing system. Using IMORC's monitoring infrastructure, we gain insights into the data-dependent behavior of the application which, in turn, allow for optimizing the accelerator. AU - Schumacher, Tobias AU - Plessl, Christian AU - Platzner, Marco ID - 2350 KW - IMORC KW - interconnect KW - performance SN - 978-1-4244-4450-2 T2 - Proc. Int. Symp. on Field-Programmable Custom Computing Machines (FCCM) TI - IMORC: Application Mapping, Monitoring and Optimization for High-Performance Reconfigurable Computing ER - TY - CONF AB - In this work we present EvoCache, a novel approach for implementing application-specific caches. The key innovation of EvoCache is to make the function that maps memory addresses from the CPU address space to cache indices programmable. We support arbitrary Boolean mapping functions that are implemented within a small reconfigurable logic fabric. For finding suitable cache mapping functions we rely on techniques from the evolvable hardware domain and utilize an evolutionary optimization procedure. We evaluate the use of EvoCache in an embedded processor for two specific applications (JPEG and BZIP2 compression) with respect to execution time, cache miss rate and energy consumption. We show that the evolvable hardware approach for optimizing the cache functions not only significantly improves the cache performance for the training data used during optimization, but that the evolved mapping functions generalize very well. Compared to a conventional cache architecture, EvoCache applied to test data achieves a reduction in execution time of up to 14.31% for JPEG (10.98% for BZIP2), and in energy consumption by 16.43% for JPEG (10.70% for BZIP2). We also discuss the integration of EvoCache into the operating system and show that the area and delay overheads introduced by EvoCache are acceptable. AU - Kaufmann, Paul AU - Plessl, Christian AU - Platzner, Marco ID - 2262 KW - EvoCache KW - evolvable hardware KW - computer architecture T2 - Proc. NASA/ESA Conference on Adaptive Hardware and Systems (AHS) TI - EvoCaches: Application-specific Adaptation of Cache Mapping ER - TY - CONF AU - Beutel, Jan AU - Gruber, Stephan AU - Hasler, Andi AU - Lim, Roman AU - Meier, Andreas AU - Plessl, Christian AU - Talzi, Igor AU - Thiele, Lothar AU - Tschudin, Christian AU - Woehrle, Matthias AU - Yuecel, Mustafa ID - 2352 KW - WSN KW - PermaSense SN - 978-1-4244-5108-1 T2 - Proc. Int. Conf. on Information Processing in Sensor Networks (IPSN) TI - PermaDAQ: A Scientific Instrument for Precision Sensing and Data Recovery in Environmental Extremes ER - TY - CONF AU - Schumacher, Tobias AU - Süß, Tim AU - Plessl, Christian AU - Platzner, Marco ID - 2238 KW - IMORC KW - graphics SN - 978-0-7695-3917-1 T2 - Proc. Int. Conf. on ReConFigurable Computing and FPGAs (ReConFig) TI - Communication Performance Characterization for Reconfigurable Accelerator Design on the XD1000 ER - TY - CONF AU - Schumacher, Tobias AU - Plessl, Christian AU - Platzner, Marco ID - 2261 KW - IMORC KW - NOC KW - KNN KW - accelerator SN - 1946-1488 T2 - Proc. Int. Conf. on Field Programmable Logic and Applications (FPL) TI - An Accelerator for k-th Nearest Neighbor Thinning Based on the IMORC Infrastructure ER - TY - CONF AB - In this paper, we introduce the Woolcano reconfigurable processor architecture. The architecture is based on the Xilinx Virtex-4 FX FPGA and leverages the Auxiliary Processing Unit (APU) as well as the partial reconfiguration capabilities to provide dynamically reconfigurable custom instructions. We also present a hardware tool flow that automatically translates software functions into custom instructions and a software tool flow that creates binaries using these instructions. While previous research on processors with reconfigurable functional units has been performed predominantly with simulation, the Woolcano architecture allows for exploring dynamic instruction set extension with commercially available hardware. Finally, we present a case study demonstrating a custom floating-point instruction generated with our approach, which achieves a 40x speedup over software-emulated floating-point operations and a 21% speedup over the Xilinx hardware floating-point unit. AU - Grad, Mariusz AU - Plessl, Christian ID - 2263 SN - 1-60132-101-5 T2 - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) TI - Woolcano: An Architecture and Tool Flow for Dynamic Instruction Set Extension on Xilinx Virtex-4 FX ER - TY - CONF AU - Woehrle, Matthias AU - Plessl, Christian AU - Lim, Roman AU - Beutel, Jan AU - Thiele, Lothar ID - 2370 KW - WSN KW - testing KW - verification SN - 978-0-7695-3158-8 T2 - IEEE Int. Conf. on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC) TI - EvAnT: Analysis and Checking of event traces for Wireless Sensor Networks ER - TY - CONF AU - Schumacher, Tobias AU - Meiche, Robert AU - Kaufmann, Paul AU - Lübbers, Enno AU - Plessl, Christian AU - Platzner, Marco ID - 2364 SN - 1-60132-064-7 T2 - Proc. Int. Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA) TI - A Hardware Accelerator for k-th Nearest Neighbor Thinning ER - TY - CONF AU - Schumacher, Tobias AU - Plessl, Christian AU - Platzner, Marco ID - 2372 KW - IMORC KW - IP core KW - interconnect T2 - Many-core and Reconfigurable Supercomputing Conference (MRSC) TI - IMORC: An infrastructure for performance monitoring and optimization of reconfigurable computers ER - TY - GEN AU - Beutel, Jan AU - Plessl, Christian AU - Woehrle, Matthias ID - 2394 TI - Increasing the Reliability of Wireless Sensor Networks with a Unit Testing Framework ER - TY - CONF AU - Woehrle, Matthias AU - Plessl, Christian AU - Beutel, Jan AU - Thiele, Lothar ID - 2392 KW - WSN KW - testing KW - distributed KW - embedded SN - 978-1-59593-694-3 T2 - Proc. Workshop on Embedded Networked Sensors (EmNets) TI - Increasing the Reliability of Wireless Sensor Networks with a Distributed Testing Framework ER - TY - CONF AU - Beutel, Jan AU - Dyer, Matthias AU - Lim, Roman AU - Plessl, Christian AU - Woehrle, Matthias AU - Yuecel, Mustafa AU - Thiele, Lothar ID - 2393 KW - WSN KW - testing KW - verification SN - 1-4244-1231-5 T2 - Proc. Int. Conf. Networked Sensing Systems (INSS) TI - Automated Wireless Sensor Network Testing ER - TY - THES AB - In this thesis, we propose to use a reconfigurable processor as main computation element in embedded systems for applications from the multi-media and communications domain. A reconfigurable processor integrates an embedded CPU core with a Reconfigurable Processing Unit (RPU). Many of our target applications require real-time signal-processing of data streams and expose a high computational demand. The key challenge in designing embedded systems for these applications is to find an implementation that satisfies the performance goals and is adaptable to new applications, while the system cost is minimized. Implementations that solely use an embedded CPU are likely to miss the performance goals. Application-Specific Integrated Circuit (ASIC)-based coprocessors can be used for some high-volume products with fixed functions, but fall short for systems with varying applications. We argue that a reconfigurable processor with a coarse-grained, dynamically reconfigurable array of modest size provides an attractive implementation platform for our application domain. The computational intensive application kernels are executed on the RPU, while the remaining parts of the application are executed on the CPU. Reconfigurable hardware allows for implementing application specific coprocessors with a high performance, while the function of the coprocessor can still be adapted due to the programmability. So far, reconfigurable technology is used in embedded systems primarily with static configurations, e.g., for implementing glue-logic, replacing ASICs, and for implementing fixed-function coprocessors. Changing the configuration at runtime enables a number of interesting application modes, e.g., on-demand loading of coprocessors and time-multiplexed execution of coprocessors, which is commonly denoted as hardware virtualization. While the use of static configurations is well understood and supported by design-tools, the role of dynamic reconfiguration is not well investigated yet. Current application specification methods and design-tools do not provide an end-to-end tool-flow that considers dynamic reconfiguration. A key idea of our approach is to reduce system cost by keeping the size of the reconfigurable array small and to use hardware virtualization techniques to compensate for the limited hardware resources. The main contribution of this thesis is the codesign of a reconfigurable processor architecture named ZIPPY, the corresponding hardware and software implementation tools, and an application specification model which explicitly considers hardware virtualization. The ZIPPY architecture is widely parametrized and allows for specifying a whole family of processor architectures. The implementation tools are also parametrized and can target any architectural variant. We evaluate the performance of the architecture with a system-level, cycle-accurate cosimulation framework. This framework enables us to perform design-space exploration for a variety of reconfigurable processor architectures. With two case studies, we demonstrate, that hardware virtualization on the Zippy architecture is feasible and enables us to trade-off performance for area in embedded systems. Finally, we present a novel method for optimal temporal partitioning of sequential circuits, which is an important form of hardware virtualization. The method based on Slowdown and Retiming allows us to decompose any sequential circuit into a number of smaller, communicating subcircuits that can be executed on a dynamically reconfigurable architecture. AU - Plessl, Christian ID - 2404 KW - Zippy SN - 978-3-8322-5561-3 TI - Hardware virtualization on a coarse-grained reconfigurable processor ER - TY - CONF AB - This paper presents a novel method for optimal temporal partitioning of sequential circuits for time-multiplexed reconfigurable architectures. The method bases on slowdown and retiming and maximizes the circuit's performance during execution while restricting the size of the partitions to respect the resource constraints of the reconfigurable architecture. We provide a mixed integer linear program (MILP) formulation of the problem, which can be solved exactly. In contrast to related work, our approach optimizes performance directly, takes structural modifications of the circuit into account, and is extensible. We present the application of the new method to temporal partitioning for a coarse-grained reconfigurable architecture. AU - Plessl, Christian AU - Platzner, Marco AU - Thiele, Lothar ID - 2401 KW - temporal partitioning KW - retiming KW - ILP T2 - Proc. Int. Conf. on Field Programmable Technology (ICFPT) TI - Optimal Temporal Partitioning based on Slowdown and Retiming ER - TY - CONF AB - This paper motivates the use of hardware virtualization on coarse-grained reconfigurable architectures. We introduce Zippy, a coarse-grained multi-context hybrid CPU with architectural support for efficient hardware virtualization. The architectural details and the corresponding tool flow are outlined. As a case study, we compare the non-virtualized and the virtualized execution of an ADPCM decoder. AU - Plessl, Christian AU - Platzner, Marco ID - 2411 KW - Zippy T2 - Proc. Int. Conf. on Application-Specific Systems, Architectures, and Processors (ASAP) TI - Zippy – A coarse-grained reconfigurable array with support for hardware virtualization ER - TY - JOUR AB - Reconfigurable architectures that tightly integrate a standard CPU core with a field-programmable hardware structure have recently been receiving impact of these design decisions on the overall system performance is a challenging task. In this paper, we first present a framework for the cycle-accurate performance evaluation of hybrid reconfigurable processors on the system level. Then, we discuss a reconfigurable processor for data-streaming applications, which attaches a coarse-grained reconfigurable unit to the coprocessor interface of a standard embedded CPU core. By means of a case study we evaluate the system-level impact of certain design features for the reconfigurable unit, such as multiple contexts, register replication, and hardware context scheduling. The results illustrate that a system-level evaluation framework is of paramount importance for studying the architectural trade-offs and optimizing design parameters for reconfigurable processors. AU - Enzler, Rolf AU - Plessl, Christian AU - Platzner, Marco ID - 2412 IS - 2-3 JF - Microprocessors and Microsystems KW - FPGA KW - reconfigurable computing KW - co-simulation KW - Zippy TI - System-level performance evaluation of reconfigurable processors VL - 29 ER -