TY - CONF AB - The next generation Grid will demand the Grid middleware to provide flexibility, transparency, and reliability. This implies the appliance of service level agreements to guarantee a negotiated level of quality of service. These requirements also affect the local resource management systems providing resources for the Grid. At this a gap between these demands and the features of today's resource management systems becomes apparent. In this paper we present an approach which closes this gap. Introducing the architecture of the virtual resource manager we highlight its main features of runtime responsibility, resource virtualization, information hiding, autonomy provision, and smooth integration of existing resource management system installations. AU - Burchard, Lars-Olof AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Linnert, Barry ID - 1995 T2 - Proc. Int. Symposium on Cluster Computing and the Grid (CCGRID) TI - Virtual Resource Manager: An Architecture for SLA-aware Resource Management ER - TY - CONF AB - Nearly all existing HPC systems are operated by resource management systems based on the queuing approach. With the increasing acceptance of grid middleware like Globus, new requirements for the underlying local resource management systems arise. Features like advanced reservation or quality of service are needed to implement high level functions like co-allocation. However it is difficult to realize these features with a resource management system based on the queuing concept since it considers only the present resource usage. In this paper we present an approach which closes this gap. By assigning start times to each resource request, a complete schedule is planned. Advanced reservations are now easily possible. Based on this planning approach functions like diffuse requests, automatic duration extension, or service level agreements are described. We think they are useful to increase the usability, acceptance and performance of HPC machines. In the second part of this paper we present a planning based resource management system which already covers some of the mentioned features. AU - Hovestadt, Matthias AU - Kao, Odej AU - Keller, Axel AU - Streit, Achim ID - 1998 KW - High Performance Computing KW - Service Level Agreement KW - Grid Resource KW - Resource Management System KW - Advance Reservation T2 - Proc. Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP) TI - Scheduling in HPC Resource Management Systems: Queuing vs. Planning VL - 2862 ER - TY - CONF AU - P. Miller, Barton AU - Labarta, Jesús AU - Schintke, Florian AU - Simon, Jens ID - 2426 SN - 978-3-540-45706-0 T2 - Proc. European Conf. on Parallel Processing (Euro-Par) TI - Performance Evaluation, Analysis and Optimization VL - 2400 ER - TY - CONF AU - Schintke, Florian AU - Simon, Jens AU - Reinefeld, Alexander ID - 2431 T2 - Proc. Int. Conf. on Computational Science (ICCS) TI - A Cache Simulator for Shared Memory Systems VL - 2074 ER - TY - CONF AB - The Testbed and Applications working group of the European Grid Forum (EGrid) is actively building and experimenting with a grid infrastructure connecting several research-based supercomputing sites located in Europe. The paper reports on our first feasibility study: running a self-migrating version of the Cactus simulation code across the European grid testbed, including "live" remote data visualization and steering from different demonstration booths at Supercomputing 2000, in Dallas, TX. We report on the problems that had to be resolved for this endeavour and identify open research challenges for building production-grade grid environments. AU - Gehring, Jörn AU - Keller, Axel AU - Reinefeld, Alexander AU - Streit, Achim ID - 2000 T2 - Proc. Int. Symposium on Cluster Computing and the Grid (CCGRID) TI - Early Experiences with the EGrid Testbed ER - TY - CONF AB - The availability of commodity high performance components for workstations and networks made it possible to build up large, PC based compute clusters at modest costs. These clusters seem to be a realistic alternative to proprietary, massively parallel systems with respect to the price/performance ratio. However, from the administration point of view, those systems are still often solely a collection of autonomous nodes, connected by a fast short area network. Therefore, aiming at providing the best possible performance in daily work to all users, a lot of work has to be done before obtaining the expected result. The paper describes the problem areas we had to cope with during the integration of two large SCI clusters (one with 64 and one with 192 processors) in the environment of the Paderborn Center for Parallel Computing. AU - Keller, Axel AU - Krawinkel, Andreas ID - 2002 T2 - Proc. Int. Symposium on Cluster Computing and the Grid (CCGRID) TI - Lessons Learned While Operating Two Large SCI Clusters ER - TY - CONF AB - RsdEditor is a graphical user interface which produces specifications of computational resources. It is used in the RSD (Resource and Service Description) environment for specifying, registering, requesting and accessing resources and services in a metacomputer. RsdEditor was designed to be used by the administrators and users of metacomputing environments. At the administrator level, the GUI is used to describe the available computing and networking components of a metacomputer. At the user level, RsdEditor can be used to specify which characteristics of the computational resources are needed to execute a meta-application. This paper is organized as follows: it first introduces RsdEditor. It then briefly describes the RSD environment, and finally, it highlights various features and implementation issues of RsdEditor. AU - Baraglia, Ranieri AU - Keller, Axel AU - Laforenza, Domenico AU - Reinefeld, Alexander ID - 2003 T2 - Proc. Heterogenous Computing Workshop HCW at IPDPS TI - RsdEditor: A Graphical User Interface for Specifying Metacomputer Components ER - TY - CONF AU - Brune, Matthias AU - Reinefeld, Alexander AU - Varnholt, Jörg ID - 2436 T2 - Proc. Int. Symp. High-Performance Distributed Computing (HPDC) TI - A Resource Description Environment for Distributed Computing Systems ER - TY - CONF AB - With the recent availability of cost-effective network cards for the PCI bus, researchers have been tempted to build up large compute clusters with standard PCs. Many of them are operated with workstation cluster management software in high-throughput or single user mode. For very large clusters with more than 100 PEs, however, it becomes necessary to implement a full fledged resource management software that allows to partition the system for multi-user access. In this paper, we present our Computing Center Software (CCS), which was originally designed for managing massively parallel high-performance computers, and now adapted to modern workstation clusters. It provides - partitioning of exclusive and non-exclusive resources, - hardware-independent scheduling of interactive and batch jobs, - open, extensible interfaces to other resource management systems, - a high degree of reliability. AU - Brune, Matthias AU - Keller, Axel AU - Reinefeld, Alexander ID - 2004 T2 - Proc. Int. Conf. on High-Performance Computing and Networking (HPCN) TI - Resource Management for High-Performance PC Clusters ER - TY - CONF AB - CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administrator level, CCS offers tools for controlling (i.e, specifying, configuring and scheduling) the system components that are operated in a computing center. Hence the name "Computing Center Software". CCS provides: hardware-independent scheduling of interactive and batch jobs; partitioning of exclusive and non-exclusive resources; open, extensible interfaces to other resource management systems; a high degree of reliability (e.g. automatic restart of crashed daemons); fault tolerance in the case of network breakdowns. The authors describe CCS as one important component for the access, job distribution, and administration of networked HPC systems in a metacomputing environment. AU - Keller, Axel AU - Reinefeld, Alexander ID - 2011 T2 - Proc. Heterogenous Computing Workshop (HCW) at IPPS TI - CCS Resource Management in Networked HPC Systems ER -