TY - CONF AB - This paper describes a data structure and a heuristic to plan and map arbitrary resources in complex combinations while applying time dependent constraints. The approach is used in the planning based workload manager OpenCCS at the Paderborn Center for Parallel Computing (PC\(^2\)) to operate heterogeneous clusters with up to 10000 cores. We also show performance results derived from four years of operation. AU - Keller, Axel ED - Klusáček, D. ED - Cirne, W. ED - Desai, N. ID - 22 KW - Scheduling Planning Mapping Workload management SN - 978-3-319-77398-8 T2 - Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) TI - A Data Structure for Planning Based Workload Management of Heterogeneous HPC Systems VL - 10773 ER - TY - JOUR AB - Virtualization technology makes data centers more dynamic and easier to administrate. Today, cloud providers offer customers access to complex applications running on virtualized hardware. Nevertheless, big virtualized data centers become stochastic environments and the simplification on the user side leads to many challenges for the provider. He has to find cost-efficient configurations and has to deal with dynamic environments to ensure service level objectives (SLOs). We introduce a software solution that reduces the degree of human intervention to manage clouds. It is designed as a multi-agent system (MAS) and placed on top of the Infrastructure as a Service (IaaS) layer. Worker agents allocate resources, configure applications, check the feasibility of requests, and generate cost estimates. They are equipped with application specific knowledge allowing it to estimate the type and number of necessary resources. During runtime, a worker agent monitors the job and adapts its resources to ensure the specified quality of service—even in noisy clouds where the job instances are influenced by other jobs. They interact with a scheduler agent, which takes care of limited resources and does a cost-aware scheduling by assigning jobs to times with low costs. The whole architecture is self-optimizing and able to use public or private clouds. Building a private cloud needs to face the challenge to find a mapping of virtual machines (VMs) to hosts. We present a rule-based mapping algorithm for VMs. It offers an interface where policies can be defined and combined in a generic way. The algorithm performs the initial mapping at request time as well as a remapping during runtime. It deals with policy and infrastructure changes. An energy-aware scheduler and the availability of cheap resources provided by a spot market are analyzed. We evaluated our approach by building up an SaaS stack, which assigns resources in consideration of an energy function and that ensures SLOs of two different applications, a brokerage system and a high-performance computing software. Experiments were done on a real cloud system and by simulations. AU - Niehörster, Oliver AU - Simon, Jens AU - Brinkmann, André AU - Keller, Axel AU - Krüger, Jens ID - 1965 IS - 3 JF - Journal of Grid Computing TI - Cost-aware and SLO Fulfilling Software as a Service VL - 10 ER - TY - CONF AB - Infrastructure as a Service providers use virtualization to abstract their hardware and to create a dynamic data center. Virtualization enables the consolidation of virtual machines as well as the migration of them to other hosts during runtime. Each provider has its own strategy to efficiently operate a data center. We present a rule based mapping algorithm for VMs, which is able to automatically adapt the mapping between VMs and physical hosts. It offers an interface where policies can be defined and combined in a generic way. The algorithm performs the initial mapping at request time as well as a remapping during runtime. It deals with policy and infrastructure changes. We extended the open source IaaS solution Eucalyptus and we evaluated it with typical policies: maximizing the compute performance and VM locality to achieve a high performance and minimizing energy consumption. The evaluation was done on state-of-the-art servers in our own data center and by simulations using a workload of the Parallel Workload Archive. The results show that our algorithm performs well in dynamic data centers environments. AU - Kleineweber, Christoph AU - Keller, Axel AU - Niehörster, Oliver AU - Brinkmann, André ID - 1968 T2 - Proc. Int. Conf. on Parallel, Distributed and Network-Based Computing (PDP) TI - Rule Based Mapping of Virtual Machines in Clouds ER - TY - JOUR AB - System virtualization has become the enabling technology to manage the increasing number of different applications inside data centers. The abstraction from the underlying hardware and the provision of multiple virtual machines (VM) on a single physical server have led to a consolidation and more efficient usage of physical servers. The abstraction from the hardware also eases the provision of applications on different data centers, as applied in several cloud computing environments. In this case, the application need not adapt to the environment of the cloud computing provider, but can travel around with its own VM image, including its own operating system and libraries. System virtualization and cloud computing could also be very attractive in the context of high‐performance computing (HPC). Today, HPC centers have to cope with both, the management of the infrastructure and also the applications. Virtualization technology would enable these centers to focus on the infrastructure, while the users, collaborating inside their virtual organizations (VOs), would be able to provide the software. Nevertheless, there seems to be a contradiction between HPC and cloud computing, as there are very few successful approaches to virtualize HPC centers. This work discusses the underlying reasons, including the management and performance, and presents solutions to overcome the contradiction, including a set of new libraries. The viability of the presented approach is shown based on evaluating a selected parallel, scientific application in a virtualized HPC environment. AU - Birkenheuer, Georg AU - Brinkmann, André AU - Kaiser, Jürgen AU - Keller, Axel AU - Keller, Matthias AU - Kleineweber, Christoph AU - Konersmann, Christoph AU - Niehörster, Oliver AU - Schäfer, Thorsten AU - Simon, Jens AU - Wilhelm, Maximilan ID - 1971 JF - Software: Practice and Experience TI - Virtualized HPC: a contradiction in terms? ER - TY - CONF AB - We present a multi-agent system on top of the IaaS layer consisting of a scheduler agent and multiple worker agents. Each job is controlled by an autonomous worker agent, which is equipped with application specific knowledge (e.g., performance functions) allowing it to estimate the type and number of necessary resources. During runtime, the worker agent monitors the job and adapts its resources to ensure the specified quality of service - even in noisy clouds where the job instances are influenced by other jobs. All worker agents interact with the scheduler agent, which takes care of limited resources and does a cost-aware scheduling by assigning jobs to times with low energy costs. The whole architecture is self-optimizing and able to use public or private clouds. AU - Niehörster, Oliver AU - Keller, Axel AU - Brinkmann, André ID - 1972 T2 - Proc. Int. Meeting of the IEEE Int. Symp. on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) TI - An Energy-Aware SaaS Stack ER - TY - CONF AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1974 T2 - Proc. Int. Conf. on Risks and Security of Internet and Systems TI - Quality Assurance of Grid Service Provisioning by Risk Aware Managing of Resource Failures ER - TY - CONF AB - Service Level Agreements (SLAs) have focal importance if the commercial customer should be attracted to the Grid. An SLA-aware resource management system has already been realize, able to fulfill the SLA of jobs even in the case of resource failures. For this, it is able to migrate checkpointed jobs over the Grid. At this, virtual execution environments allow to increase the number of potential migration targets significantly. In this paper we outline the concept of such virtual execution environments and focus on the SLA negotiation aspects. AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1975 T2 - Proc. Int. DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and New Technologies TI - Virtual Execution Environments and the Negotiation of Service Level Agreements in Grid Systems ER - TY - CONF AB - Abstract: Commercial Grid users demand for contractually fixed QoS levels. Service Level Agreements (SLAs) are powerful instruments for describing such contracts. SLA-aware resource management is the foundation for realizing SLA contracts within the Grid. OpenCCS is such an SLA-aware RMS, using transparent checkpointing to cope with resource outages. It generates a compatibility profile for each checkpoint dataset, so that the job can be resumed even on resources within the Grid. However, only a small number of Grid resources comply to such a profile. This paper describes the concept of virtual execution environments and how they increase the number of potential migration targets.The paper also describes how these virtual execution environments have been implemented within the OpenCCS resource management system. AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1976 T2 - Proc. Int. Workshop on Scheduling and Resource Management for Parallel and Distributed Systems TI - Implementation of Virtual Execution Environments for improving SLA-compliant Job Migration in Grids ER - TY - CONF AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1978 T2 - Proc. Int. Conf. on Grid Computing and Applications (GCA) TI - Germany, Belgium, France, and Back Again: Job Migration using Globus ER - TY - CONF AB - OpenCCS is an SLA-aware resource management system which uses transparent checkpointing of applications and migration of checkpoint datasets for ensuring SLA-compliance also in case of resource outages. Migration of checkpoints presumes a high grade of compatibility between source and target resource. Hence, even in large Grid systems only a small number of resources are eligible migration targets. This short paper describes the concept of virtual execution environments and how they increase the number of potential migration targets. It will also outline an implementation within OpenCCS. AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1980 T2 - Proc. Int. Conf. on Services Computing (SCC) TI - Virtual Execution Environments for ensuring SLA-compliant Job Migration in Grids ER - TY - CONF AB - Contractually fixed service quality levels are mandatory prerequisites for attracting the commercial user to Grid environments. Service Level Agreements (SLAs) are powerful instruments for describing obligations and expectations in such a business relationship. At the level of local resource management systems, checkpointing and restart is an important instrument for realizing fault tolerance and SLA awareness. This paper highlights the concepts of migrating such checkpoint datasets to achieve the goal of SLA compliant job execution. AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1981 SN - 978-0-7695-3177-9 T2 - Proc. Int. Conf. on Grid and Pervasive Computing (GPC) TI - Job Migration and Fault Tolerance in SLA-aware Resource Management Systems ER - TY - CONF AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ED - Gonzalez, T. F. ID - 1983 SN - 978-0-88986-773-4 T2 - Proc. Int. Conf. on Parallel and Distributed Computing and Systems (PDCS) TI - Enhancing SLA Provisioning by Utilizing Profit-Oriented Fault Tolerance ER - TY - GEN AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1984 TI - Increasing Fault-tolerance by Introducing Virtual Execution Environments. ER - TY - GEN AU - Hovestadt, Matthias AU - Keller, Axel AU - Voss, Kerstin ID - 1985 T2 - Paderborner Universitätszeitschrift (puz) TI - Paderborn, Belgien, Frankreich und zurück VL - SS 2 ER - TY - CONF AB - Service level agreements (SLAs) are powerful instruments for describing all obligations and expectations in a business relationship. It is of focal importance for deploying Grid technology to commercial applications. The EC-funded project HPC4U (Highly Predictable Clusters for Internet Grids) aimed at introducing SLA-awareness in local resource management systems, while the EC-funded project AssessGrid introduced the notion of risk, which is associated with every business contract. This paper highlights the concept of planning based resource management and describes the SLA-aware scheduler developed and used in these projects. AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1986 T2 - Proc. Workshop of the UK PLANNING AND SCHEDULING Special Interest Group (PlanSIG) TI - Planning-based Scheduling for SLA-awareness and Grid Integration ER - TY - CONF AU - Battré, Dominic AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Voss, Kerstin ID - 1988 T2 - Proc. Cracow Grid Workshop, Academic Computer Center CYFRNET TI - Transparent Cross Border Migration of Parallel Multi Node Applications ER - TY - CHAP AU - Heine, Felix AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel ED - Jouberta, Gerhard R. ED - Nagel, Wolfgang E. ED - Peters, Frans J. ED - Plata, Oscar ED - Tirado, Francisco ED - Zapata, Emilio L. ID - 1989 T2 - Parallel Computing: Current and Future Issues of High End Computing TI - Provision of Fault Tolerance with Grid-enabled and SLA-aware Resource Management Systems ER - TY - CHAP AB - In this paper, we describe the architecture of the virtual resource manager VRM, a management system designed to reside on top of local resource management systems for cluster computers and other kinds of resources. The most important feature of the VRM is its capability to handle quality-of-service (QoS) guarantees and service-level agreements (SLAs). The particular emphasis of the paper is on the various opportunities to deal with local autonomy for resource management systems not supporting SLAs. As local administrators may not want to hand over complete control to the Grid management, it is necessary to define strategies that deal with this issue. Local autonomy should be retained as much as possible while providing reliability and QoS guarantees for Grid applications, e.g., specified as SLAs. AU - Burchard, Lars-Olof AU - Heine, Felix AU - Heiss, Hans-Ulrich AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Linnert, Barry AU - Schneider, Jörg ED - Getov, Vladimir ED - Laforenza, Domenico ED - Reinefeld, Alexander ID - 1991 T2 - Future Generation Grids TI - The Virtual Resource Manager: Local Autonomy versus QoS Guarantees for Grid Applications ER - TY - CHAP AB - Grid Computing promises an efficient sharing of world-wide distributed resources, ranging from hardware, software, expert knowledge to special I/O devices. However, although the main Grid mechanisms are already developed or are currently addressed by tremendous research effort, the Grid environment still suffers from a low acceptance in different user communities. Beside difficulties regarding an intuitive and comfortable resource access, various problems related to the reliability and the Quality-of-Service while using the Grid exist. Users should be able to rely, that their jobs will have certain priority at the remote Grid site and that they will be finished upon the agreed time regardless of any provider problems. Therefore, QoS issues have to be considered in the Grid middleware but also in the local resource management systems at the Grid sites. However, most of the currently used resource management systems are not suitable for SLAs, as they do not support resource reservation and do not offer mechanisms for job checkpointing/migration respectively. The latter are mandatory for Grid providers as rescue anchor in case of system failures or system overload. This paper focuses on SLA-aware job migration and presents a work, which is being performed in the EU supported project HPC4U. AU - Heine, Felix AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel ED - Grandinetti, Lucio ID - 1990 T2 - Grid Computing: New Frontiers of High Performance Computing TI - SLA-aware Job Migration in Grid Environments VL - 14 ER - TY - CONF AB - The next generation grid applications demand grid middleware for a flexible negotiation mechanism supporting various ways of quality-of-service (QoS) guarantees. In this context, a QoS guarantee covers simultaneous allocations of various kinds of different resources, such as processor runtime, storage capacity, or network bandwidth, which are specified in the form of service level agreements (SLA). Currently, a gap exists between the capabilities of grid middleware and the underlying resource management systems concerning their support for QoS and SLA negotiation. In this paper we present an approach which closes this gap. Introducing the architecture of the virtual resource manager, we highlight its main QoS management features like run-time responsibility, co-allocation, and fault tolerance. AU - Burchard, Lars-Olof AU - Heine, Felix AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Linnert, Barry ID - 1992 T2 - Proc. IEEE Int. Parallel & Distributed Processing Symposium (IPDPS) TI - A Quality-of-Service Architecture for Future Grid Computing Applications. ER -