List of affiliations, talk titles and abstracts ===== Robert Ross Title: TBD http://www-unix.mcs.anl.gov/~rross/ rross@mcs.anl.gov ===== Angelos Bilas Title: Scalable Storage Systems & Efficient Remote Block-level I/O Abstract: Modern storage systems are required to scale to large storage capacities and I/O throughput in a cost effective manner. For this reason, they are increasingly being built out of commodity components, mainly PCs equipped with large numbers of disks and interconnected of high-performance system area networks. A main issue in these efforts is to achieve high I/O throughput over commodity, low-cost system area networks and commodity operating systems. In this talk I will first give an overview of the problems in the area, I will sketch our approach for addressing architectural limitations of storage systems and I will present our recent work in understanding the performance of remote block-level storage I/O over commodity, RDMA-capable network interfaces and networks. Angelos Bilas ICS-FORTH, STEP-C Associate Professor P.O. Box 1385 Univ. of Crete and FORTH 71110 Heraklion, Greece E-mail: bilas@csd.uoc.gr Tel: +30 2810391669 URL: www.csd.uoc.gr/~bilas Fax: +30 2810391661 ===== Peter Sobe Title: Redundancy Schemes for Distributed Storage Abstract: For permanent saving of data on distributed storage systems, the systems should employ redundancy codes and tolerate failures of storage nodes. In this talk, the combination of a striping-based parallel storage with several variants of parity and reed/solomon coding is focused. A couple of insights for a flexible coding and efficient en- and decoding procedures are presented. We provide performance results obtained for several redundancy layouts with the NetRAID storage system that got correlated with analytically derived reliabilities. Dr.-Ing. Peter Sobe University of Luebeck Institute of Computer Engineering Ratzeburger Allee 160 D-23538 Luebeck E-Mai: sobe@iti.uni-luebeck.de Web: www.iti.uni-luebeck.de ===== Michael Kuhn, Christian Lohse Title: File Systems for Mass Storage of Image Data in Bioinformatics Abstract: Not only do certain applications produce great amounts of data, they also split this data into a lot of different files -- in the example shown, several millions. Therefore it is important that underlying storage systems provide a large amount of space, sufficient throughput and also the ability to handle these enormous amounts of files. We have tested three distributed file systems -- PVFS2, Lustre and GPFS -- according to these needs. To do this we have used the existing synthetic file system benchmark b_eff_io but have also developed tests to measure the performance with an increasing number of files and the throughput with a varying number of clients. The results have shown problems with the handling of metadata within some of the tested file systems which also affect their overall performance. Universität Heidelberg Research Group of Thomas Ludwig michael.kuhn@stud.uni-heidelberg.de lohse.c@gmx.de ===== Julian Kunkel Title: Performance Analysis of Parallel File Systems Abstract: It is hard to locate performance bottlenecks in a complex software like a parallel file system. To simplify the localization of a bottleneck stubs can be used. A efficient stub could replace a layer or module of the file system and pretend the completion of operation to the other components. This allows to get an approximative maximum throughput for the other layers and compare it to the theoretical throughput. For an instance the replacement of the persistency layer with a fast stub could reveal bugs in the architecture of the parallel file system and might help to locate inefficiencies in the persistency layer. In addition some results for using such a stub are given for PVFS and a simple performance model is introduced which allows to estimate the performance of a parallel file system based on the influence of disk, cpu and network. Title: Migration and Load Balancing Within PVFS2 Abstract: In this technical talk a mechanism for the migration of datafiles for PVFS (V2) is introduced. In PVFS a logical file is usually striped over multiple datafiles, which are located on different I/O servers to increase the throughput. Migration can be used to implement load balancing by moving datafiles from busy servers to idle server. This could use the available resources more efficient. Such a load balancing concept fits well in PVFS. The author currently implements presented mechanisms and will evaluate different policies and use cases of this concept in the future. Universität Heidelberg Research Group of Thomas Ludwig Julian.Kunkel@gmx.de ===== Stephan Krempel Title: Connecting MPI-IO Calls to Their Corresponding PVFS2 Disk Operations Abstract: We find ourself in a cluster environment with MPICH2 as massage passing interface running parallel programs that use ROMIO for Input/Output to a PVFS2 parallel file system. In the past we could see that MPI-IO calls in our programs trigger some PVFS2 disk operations as supposed, but unfortunately we couldn't see which of them belong together. The talk is about two approaches to overcome this deficiency. The first is the one introduced in the Speaker's Bachelor thesis in march 2006. The second one is a new realization of the same idea, this time using the implementation of a PVFS_hint mechanism proposed by Julian Kunkel. Universität Heidelberg Research Group of Thomas Ludwig stephan.krempel@gmx.de ===== Lars Schneidenbach Title: Analysis of the BMI/TCP Module Abstract: The motivation of this work was to find out the reason/s for the gap between raw TCP and BMI/TCP latency, since we were able to reduce the gap from 20 mys to 5 mys in the BMI/GAMMA-Module. The talk will focus on the following questions: - Are there any implementation or design issues that affect BMI/TCP performance? - How do BMI interface semantics match TCP? Dipl.Inf. Lars Schneidenbach lschneid@cs.uni-potsdam.de Universität Potsdam Tel : +49 331/977 3123 Institut f. Informatik, Fax : +49 331/977 3122 August-Bebel-Straße 89, Haus 4 D-14482 Potsdam, Germany ===== Alejandro Calderón Mateos Title: Fault tolerant support for parallel file systems in clusters Abstract: In order to increase the performance in Cluster I/O, more I/O elements are used. To keep a low cost solution, commodity hardware is also used. So, the reliability of the system decreases when you need more I/O elements. Traditional solutions are focused on redundant hardware (both, disks and network connections) but at the end, these solutions increase the cost and/or set limits to the system scalability. Moreover, the fault tolerant support is the same for all user files and can not be changed during the file's life in a natural way. Users do not have any possibilities to use the parallel file system to find any solution for it owns necessities. This talk introduce some ideas for adding a flexible fault tolerant support to parallel file systems. The main advantages of the new system are: * fault tolerant support at file level * the fault tolerant support can be change along the file life * users can select a solution that meets his/her requirement easily. There is a preliminary work on the Expand parallel file system for the ideas proposed and now we are working for adding this support to PVFS2. University: Universidad Carlos III de Madrid Surface address: Escuela Politécnica Superior Universidad Carlos III de Madrid Av. Universidad 30 28911 Leganés (Madrid) Spain Tel. (UC3M): +34 91 624 9497 Fax (UC3M): +34 91 624 9129 Webpage: http://arcos.inf.uc3m.es/~acaldero/ Temporal address surface (until end of October): GastdozentenHaus Universität Stuttgart Pfaffenwaldring 54 70569 Stuttgart Germany Temporal Tel. (until end of October) +49 711 687 00853 ===== Sebastian Kalcher Title: ClusterRAID: Reliable Mass Storage on Unreliable Cluster Components Abstract: Upcoming high energy physics experiments at CERN will deploy large PC clusters used as part of the detector readout chain (e.g. the high level trigger), or for the offline analysis of data. Taking the current growth rate of hard drive capacity into account the clusters of the experiments will have an online disk capacity which is significant when being compared with the requirements of the experiments for mass storage. However, in order to harness this online capacity for a distributed storage system, one crucial problem has to be solved: the inherent unreliability of the cluster components, in particular of the hard drives. We present the concept and preliminary results for the ClusterRAID, a fault-tolerant distributed mass storage system built on the hard disks in a PC farm. The key paradigm of the architecture is to make the local hard drive reliable. By using a special variant of the well-known Reed-Solomon codes for error-correction the system provides an adjustable degree of fault-tolerance while minimizing both required network transactions and space overhead. The realization of a prototype by a set of kernel modules for the GNU/Linux operating system includes the sourcing out of the computing intensive coding to powerful graphics hardware and the development of a dedicated FPGA-based coprocessor. Kirchhoff Institute of Physics Computer Science/Computer Engineering INF 227, Office 3.314 69120 Heidelberg Tel.: (+49) 6221/54-9815 http://www.kip.uni-heidelberg.de/ti/ ===== Christoph Biardzki Title: An introduction to the HLRB2 I/O-Subsystem Abstract: The "Höchstleistungsrechner in Bayern 2" (HLRB2) is the newest german national supercomputer. Installation Phase 1, in operation since September 11th, consists of a 4096 processor SGI Altix NUMA system with 17 TB of memory and a peak performance of 24.5 TFlops. Contrary to its predecessor more than 30% of the budget has been spent on I/O-subsystems: a 40 TB scalable NAS system and a 300 TB CXFS file system using more than 1400 disks. The talk will show requirements and design considerations, present benchmark results obtained during acceptance tests and discuss "lessons learned". --Leibniz-Rechenzentrum (LRZ) http://www.lrz.de Abteilung Hochleistungssysteme Raum I.3.071 - Boltzmannstraße 1, 85748 Garching Tel. +49 89 35831-8853 ===== Hipolito Vasquez Universität Heidelberg Research Group of Thomas Ludwig hipolito.vasquez@informatik.uni-heidelberg.de ===== Glenn Luecke Iowa State University 271 Durham Center Ames IA 50011-2251 Phone: (515) 294-6659 Fax: (515) 294-1717 grl@iastate.edu =====