|
|
|
Organic Computing as a
concept to control parallel computations in P2P Grids
|
In many
scientific fields, such as bioinformatics, physics or proteomics,
there is a need for massive
parallel computation power. This power is
mostly provided by supercomputers or computer clusters that enable
the execution of tightly coupled parallel computations. The
availability of powerful computing resources and high bandwidth
network connections has allowed scientists to share their data from
simulations and experiments within their community. At the end of
the 1990's scientists started to create collaborations in the
framework of so called e-science projects. The
needs
for distributing, accessing and storing data from such projects
were the motivation to create an infrastructure that would enable
coupling of distributed resources like supercomputers, clusters and
storage servers to a so called “Grid”. Above all this
infrastructure shall enable a global-scale distributed
execution of parallel computations and therefore it is also
referred to as computational
Grid.

Distributed computing in the Internet is a
popular
concept for the large-scale computation of applications consisting
of a great number of independent tasks. Although distributed computing systems are usually
implemented in a worker/master scheme, they have also been
discussed for some time to be organized in P2P Grids, also known as ad hoc
Grids. The idea of this
technology is to connect hundreds of thousands of independent
computers (peers, nodes) which already use an arbitrary
communication technology in a computational Grid. This environment
allows a spontaneous worldwide
sharing and aggregation of resources
with minimal administrative requirements. In general, P2P systems
integrate resources that are less powerful than in Grid systems,
but their quantity allows them to be aggregated and used for large
coarse-grained parallel computations.
|
|
|
|
However,
many computational problems cannot be divided into independent
tasks. Medium-grained parallel computations typical for scientific
computing are usually performed on computer clusters, for they
require the availability of nodes that are well connected to each
other in terms of low latency. This requirement concerns the
problem of scheduling, i.e. the mapping of interacting processes to
appropriate nodes. Since it is difficult to find well-connected
clusters of nodes in a dynamic and heterogeneous P2P Grid
environment, P2P Grids have not been used to perform medium-grained
parallel computations up to now.

The P2P
concept itself might be ideal to find adequate resources, if each
of them can be employed independently from all the others. However,
a P2P overlay network does not reflect the connectivity of the
underlying physical network layer. The widely distributed resources
of a P2P Grid are heterogeneous regarding both processing power and
connectivity (network bandwidth and latency) as depicted in the
figure above by different line widths. Moreover, they dynamically
change their availability and performance. The complexity of such a
dynamic distributed system increases with the number of connected
computers. All these aspects make task scheduling a fundamental
problem in developing a P2P computational Grid system. For reasons
of complexity, a scalable solution to this problem is more and more
expected by self-organized approaches according to the fundamental
methodological position of organic computing.
|
|
|
|