Project Page

P2P Grid

Introduction

Scheduling

Basic concepts

Process migration

Java MPI

System overview

Simulations





Organic Computing as a concept to control parallel computations in P2P Grids

Introduction

In many scientific fields, such as bioinformatics, physics or proteomics, there is a need for massive parallel computation power. This power is mostly provided by supercomputers or computer clusters that enable the execution of tightly coupled parallel computations. The availability of powerful computing resources and high bandwidth network connections has allowed scientists to share their data from simulations and experiments within their community. At the end of the 1990's scientists started to create collaborations in the framework of so called e-science projects. The needs for distributing, accessing and storing data from such projects were the motivation to create an infrastructure that would enable coupling of distributed resources like supercomputers, clusters and storage servers to a so called “Grid”. Above all this infrastructure shall enable a global-scale distributed execution of parallel computations and therefore it is also referred to as computational Grid.




Distributed computing in the Internet is a popular concept for the large-scale computation of applications consisting of a great number of independent tasks. Although distributed computing systems are usually implemented in a worker/master scheme, they have also been discussed for some time to be organized in P2P Grids, also known as ad hoc Grids. The idea of this technology is to connect hundreds of thousands of independent computers (peers, nodes) which already use an arbitrary communication technology in a computational Grid. This environment allows a spontaneous worldwide sharing and aggregation of resources with minimal administrative requirements. In general, P2P systems integrate resources that are less powerful than in Grid systems, but their quantity allows them to be aggregated and used for large coarse-grained parallel computations.



However, many computational problems cannot be divided into independent tasks. Medium-grained parallel computations typical for scientific computing are usually performed on computer clusters, for they require the availability of nodes that are well connected to each other in terms of low latency. This requirement concerns the problem of scheduling, i.e. the mapping of interacting processes to appropriate nodes. Since it is difficult to find well-connected clusters of nodes in a dynamic and heterogeneous P2P Grid environment, P2P Grids have not been used to perform medium-grained parallel computations up to now.

The P2P concept itself might be ideal to find adequate resources, if each of them can be employed independently from all the others. However, a P2P overlay network does not reflect the connectivity of the underlying physical network layer. The widely distributed resources of a P2P Grid are heterogeneous regarding both processing power and connectivity (network bandwidth and latency) as depicted in the figure above by different line widths. Moreover, they dynamically change their availability and performance. The complexity of such a dynamic distributed system increases with the number of connected computers. All these aspects make task scheduling a fundamental problem in developing a P2P computational Grid system. For reasons of complexity, a scalable solution to this problem is more and more expected by self-organized approaches according to the fundamental methodological position of organic computing.


top next: self-organized scheduling orco


Last modified: Thu Feb 05 20:33:00 CET 2009