TY - CONF AU - Awais, Muhammad AU - Ghasemzadeh Mohammadi, Hassan AU - Platzner, Marco ID - 21610 T2 - Proceedings of the ACM Great Lakes Symposium on VLSI (GLSVLSI) 2021 TI - LDAX: A Learning-based Fast Design Space Exploration Framework for Approximate Circuit Synthesis ER - TY - GEN AU - Rehnen, Jakob Werner ID - 22216 TI - Decomposition of Arithmetic Components for the Approximate Circuit Synthesis with EvoApproxLib ER - TY - CONF AB - Approximate computing (AC) has acquired significant maturity in recent years as a promising approach to obtain energy and area-efficient hardware. Automated approximate accelerator synthesis involves a great deal of complexity on the size of design space which exponentially grows with the number of possible approximations. Design space exploration of approximate accelerator synthesis is usually targeted via heuristic-based search methods. The majority of existing frameworks prune a large part of the design space using a greedy-based approach to keep the problem tractable. Therefore, they result in inferior solutions since many potential solutions are neglected in the pruning process without the possibility of backtracking of removed approximate instances. In this paper, we address the aforementioned issue by adopting Monte Carlo Tree Search (MCTS), as an efficient stochastic learning-based search algorithm, in the context of automated synthesis of approximate accelerators. This enables the synthesis frameworks to deeply subsamples the design space of approximate accelerator synthesis toward most promising approximate instances based on the required performance goals, i.e., power consumption, area, or/and delay. We investigated the challenges of providing an efficient open-source framework that benefits analytical and search-based approximation techniques simultaneously to both speed up the synthesis runtime and improve the quality of obtained results. Besides, we studied the utilization of machine learning algorithms to improve the performance of several critical steps, i.e., accelerator quality testing, in the synthesis framework. The proposed framework can help the community to rapidly generate efficient approximate accelerators in a reasonable runtime. AU - Awais, Muhammad AU - Platzner, Marco ID - 22309 KW - Approximate computing KW - Design space exploration KW - Accelerator synthesis T2 - Proceedings of IEEE Computer Society Annual Symposium on VLSI TI - MCTS-Based Synthesis Towards Efficient Approximate Accelerators ER - TY - GEN AB - This bachelor thesis presents a C/C++ implementation of the XCS algorithm for an embedded system and profiling results concerning the execution time of the functions. These are then analyzed in relation to the input characteristics of the examined learning environments and compared with related work. Three main conclusions can be drawn from the measured results. First, the maximum size of the population of the classifiers influences the runtime of the genetic algorithm; second, the size of the input space has a direct effect on the execution time of the matching function; and last, a larger action space results in a longer runtime generating the prediction for the possible actions. The dependencies identified here can serve to optimize the computational efficiency and make XCS more suitable for embedded systems. AU - Brede, Mathis ID - 22483 TI - Implementation and Profiling of XCS in the Context of Embedded Systems ER - TY - CONF AU - Witschen, Linus Matthias AU - Wiersema, Tobias AU - Raeisi Nafchi, Masood AU - Bockhorn, Arne AU - Platzner, Marco ED - Hannig, Frank ED - Derrien, Steven ED - Diniz, Pedro ED - Chillet, Daniel ID - 21953 T2 - Proceedings of International Symposium on Applied Reconfigurable Computing (ARC'21) TI - Timing Optimization for Virtual FPGA Configurations ER - TY - JOUR AB - Abstract Background Hand amputation can have a truly debilitating impact on the life of the affected person. A multifunctional myoelectric prosthesis controlled using pattern classification can be used to restore some of the lost motor abilities. However, learning to control an advanced prosthesis can be a challenging task, but virtual and augmented reality (AR) provide means to create an engaging and motivating training. Methods In this study, we present a novel training framework that integrates virtual elements within a real scene (AR) while allowing the view from the first-person perspective. The framework was evaluated in 13 able-bodied subjects and a limb-deficient person divided into intervention (IG) and control (CG) groups. The IG received training by performing simulated clothespin task and both groups conducted a pre- and posttest with a real prosthesis. When training with the AR, the subjects received visual feedback on the generated grasping force. The main outcome measure was the number of pins that were successfully transferred within 20 min (task duration), while the number of dropped and broken pins were also registered. The participants were asked to score the difficulty of the real task (posttest), fun-factor and motivation, as well as the utility of the feedback. Results The performance (median/interquartile range) consistently increased during the training sessions (4/3 to 22/4). While the results were similar for the two groups in the pretest, the performance improved in the posttest only in IG. In addition, the subjects in IG transferred significantly more pins (28/10.5 versus 14.5/11), and dropped (1/2.5 versus 3.5/2) and broke (5/3.8 versus 14.5/9) significantly fewer pins in the posttest compared to CG. The participants in IG assigned (mean ± std) significantly lower scores to the difficulty compared to CG (5.2 ± 1.9 versus 7.1 ± 0.9), and they highly rated the fun factor (8.7 ± 1.3) and usefulness of feedback (8.5 ± 1.7). Conclusion The results demonstrated that the proposed AR system allows for the transfer of skills from the simulated to the real task while providing a positive user experience. The present study demonstrates the effectiveness and flexibility of the proposed AR framework. Importantly, the developed system is open source and available for download and further development. AU - Boschmann, Alexander AU - Neuhaus, Dorothee AU - Vogt, Sarah AU - Kaltschmidt, Christian AU - Platzner, Marco AU - Dosen, Strahinja ID - 30906 IS - 1 JF - Journal of NeuroEngineering and Rehabilitation KW - Health Informatics KW - Rehabilitation SN - 1743-0003 TI - Immersive augmented reality system for the training of pattern classification control with a myoelectric prosthesis VL - 18 ER - TY - JOUR AU - Rodriguez, Alfonso AU - Otero, Andres AU - Platzner, Marco AU - De la Torre, Eduardo ID - 30907 JF - IEEE Transactions on Computers KW - Computational Theory and Mathematics KW - Hardware and Architecture KW - Theoretical Computer Science KW - Software SN - 0018-9340 TI - Exploiting Hardware-Based Data-Parallel and Multithreading Models for Smart Edge Computing in Reconfigurable FPGAs ER - TY - CONF AU - Hansmeier, Tim ID - 29137 T2 - HEART '21: Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies TI - Self-aware Operation of Heterogeneous Compute Nodes using the Learning Classifier System XCS ER - TY - GEN AB - Autonomous mobile robots are becoming increasingly more capable and widespread. Reliable Obstacle avoidance is an integral part of autonomous navigation. This involves real time interpretation and processing of a complex environment. Strict time and energy constraints of a mobile autonomous system make efficient computation extremely desirable. The benefits of employing Hardware/Software co-designed applications are obvious and significant. Hardware accelerators are used for efficient processing of the algorithms by exploiting parallelism. FPGAs are a class of hardware accelerators, which can contain hundreds of small execution units, and can be used for Hardware/Software co-designed application. However, there is a reluctance when it comes to adoption of these devices in well established application domains, such as Robotics, due to a steep learning curve needed for FPGA application design. ReconROS has successfully bridged the gap between robotic and FPGA application development, by providing an intuitive, common development platform for robotic application development for FPGA. It does so by integrating Robotics Operating System(ROS) which is an industry and academia standard for robotics application development, with ReconOS, an operating system for re-configurable hardware. In this thesis an obstacle avoidance system is designed and implemented for an autonomous vehicle using ReconROS. The objectives of the thesis is to demonstrate and explore ReconROS integration within the ROS ecosystem and explore the design process within ReconROS framework, and to demonstrate the effectiveness of Hardware Acceleration in Robotics, by analysing the resulting architectures for Latency and Power Consumption. AU - Sheikh, Muhammad Aamir ID - 29540 TI - Design and Implementation of a ReconROS-based Obstacle Avoidance System ER - TY - GEN AB - Robotics applications process large amounts of data in real-time and require compute platforms that provide high performance and energy-efficiency. FPGAs are well-suited for many of these applications, but there is a reluctance in the robotics community to use hardware acceleration due to increased design complexity and a lack of consistent programming models across the software/hardware boundary. In this paper we present ReconROS, a framework that integrates the widely-used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers. This unique combination gives ROS2 developers the flexibility to transparently accelerate parts of their robotics applications in hardware. We elaborate on the architecture and the design flow for ReconROS and report on a set of experiments that underline the feasibility and flexibility of our approach. AU - Lienen, Christian AU - Platzner, Marco ID - 22764 T2 - arXiv:2107.07208 TI - Design of Distributed Reconfigurable Robotics Systems with ReconROS ER - TY - CONF AU - Hansmeier, Tim AU - Platzner, Marco ID - 21813 SN - 978-1-4503-8351-6 T2 - GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion TI - An Experimental Comparison of Explore/Exploit Strategies for the Learning Classifier System XCS ER - TY - JOUR AB - Verification of software and processor hardware usually proceeds separately, software analysis relying on the correctness of processors executing machine instructions. This assumption is valid as long as the software runs on standard CPUs that have been extensively validated and are in wide use. However, for processors exploiting custom instruction set extensions to meet performance and energy constraints the validation might be less extensive, challenging the correctness assumption. In this paper we present a novel formal approach for hardware/software co-verification targeting processors with custom instruction set extensions. We detail two different approaches for checking whether the hardware fulfills the requirements expected by the software analysis. The approaches are designed to explore a trade-off between generality of the verification and computational effort. Then, we describe the integration of software and hardware analyses for both techniques and describe a fully automated tool chain implementing the approaches. Finally, we demonstrate and compare the two approaches on example source code with custom instructions, using state-of-the-art software analysis and hardware verification techniques. AU - Jakobs, Marie-Christine AU - Pauck, Felix AU - Platzner, Marco AU - Wehrheim, Heike AU - Wiersema, Tobias ID - 27841 JF - IEEE Access KW - Software Analysis KW - Abstract Interpretation KW - Custom Instruction KW - Hardware Verification TI - Software/Hardware Co-Verification for Custom Instruction Set Processors ER - TY - CONF AU - Ahmed, Qazi Arbab ID - 29138 T2 - 2021 IFIP/IEEE 29th International Conference on Very Large Scale Integration (VLSI-SoC) TI - Hardware Trojans in Reconfigurable Computing ER - TY - CONF AB - The battle of developing hardware Trojans and corresponding countermeasures has taken adversaries towards ingenious ways of compromising hardware designs by circumventing even advanced testing and verification methods. Besides conventional methods of inserting Trojans into a design by a malicious entity, the design flow for field-programmable gate arrays (FPGAs) can also be surreptitiously compromised to assist the attacker to perform a successful malfunctioning or information leakage attack. The advanced stealthy malicious look-up-table (LUT) attack activates a Trojan only when generating the FPGA bitstream and can thus not be detected by register transfer and gate level testing and verification. However, also this attack was recently revealed by a bitstream-level proof-carrying hardware (PCH) approach. In this paper, we present a novel attack that leverages malicious routing of the inserted Trojan circuit to acquire a dormant state even in the generated and transmitted bitstream. The Trojan's payload is connected to primary inputs/outputs of the FPGA via a programmable interconnect point (PIP). The Trojan is detached from inputs/outputs during place-and-route and re-connected only when the FPGA is being programmed, thus activating the Trojan circuit without any need for a trigger logic. Since the Trojan is injected in a post-synthesis step and remains unconnected in the bitstream, the presented attack can currently neither be prevented by conventional testing and verification methods nor by recent bitstream-level verification techniques. AU - Ahmed, Qazi Arbab AU - Wiersema, Tobias AU - Platzner, Marco ID - 20681 T2 - 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE) TI - Malicious Routing: Circumventing Bitstream-level Verification for FPGAs ER - TY - CONF AU - Clausing, Lennart ID - 30909 T2 - Proceedings of the 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies TI - ReconOS64: High-Performance Embedded Computing for Industrial Analytics on a Reconfigurable System-on-Chip ER - TY - CONF AU - Ghasemzadeh Mohammadi, Hassan AU - Jentzsch, Felix AU - Kuschel, Maurice AU - Arshad, Rahil AU - Rautmare, Sneha AU - Manjunatha, Suraj AU - Platzner, Marco AU - Boschmann, Alexander AU - Schollbach, Dirk ID - 30908 T2 - Machine Learning and Principles and Practice of Knowledge Discovery in Databases TI - FLight: FPGA Acceleration of Lightweight DNN Model Inference in Industrial Analytics ER - TY - CONF AU - Guetttatfi, Zakarya AU - Kaufmann, Paul AU - Platzner, Marco ID - 3583 T2 - Proceedings of the International Workshop on Applied Reconfigurable Computing (ARC) TI - Optimal and Greedy Heuristic Approaches for Scheduling and Mapping of Hardware Tasks to Reconfigurable Computing Devices ER - TY - GEN AU - Chandrakar, Khushboo ID - 21324 TI - Comparison of Feature Selection Techniques to Improve Approximate Circuit Synthesis ER - TY - GEN AB - Robots are becoming increasingly autonomous and more capable. Because of a limited portable energy budget by e.g. batteries, and more demanding algorithms, an efficient computation is of interest. Field Programmable Gate Arrays (FPGAs) for example can provide fast and efficient processing and the Robot Operating System (ROS) is a popular middleware used for robotic applications. The novel ReconROS combines version 2 of the Robot Operating System with ReconOS, a framework for integrating reconfigurable hardware. It provides a unified interface between software and hardware. ReconROS is evaluated in this thesis by implementing a Sobel filter as the video processing application, running on a Zynq-7000 series System on Chip. Timing measurements were taken of execution and transfer times and were compared to theoretical values. Designing the hardware implementation is done by C code using High Level Synthesis and with the interface and functionality provided by ReconROS. An important aspect is the publish/subscribe mechanism of ROS. The Operating System interface functions for publishing and subscribing are reasonably fast at below 10 ms for a 1 MB color VGA image. The main memory interface performs well at higher data sizes, crossing 100 MB/s at 20 kB and increasing to a maximum of around 150 MB/s. Furthermore, the hardware implementation introduces consistency to the execution times and performs twice as fast as the software implementation. AU - Henke, Luca-Sebastian ID - 21432 TI - Evaluation of a ReconOS-ROS Combination based on a Video Processing Application ER - TY - CONF AU - Gatica, Carlos Paiz AU - Platzner, Marco ID - 21584 SN - 2522-8579 T2 - Machine Learning for Cyber Physical Systems (ML4CPS 2017) TI - Adaptable Realization of Industrial Analytics Functions on Edge-Devices using Reconfigurable Architectures ER -