Документ взят из кэша поисковой машины. Адрес оригинального документа : http://acat02.sinp.msu.ru/presentations/nikitin/posterACAT2002.doc
Дата изменения: Mon Jul 29 18:29:51 2002
Дата индексирования: Mon Oct 1 20:31:09 2012
Кодировка:


The GA Based Approach to Optimization of Parallel Calculations in Large
Physics Problems.



Dr. Andrey Nikitin

Dr. Ludmila Nikitina




Moscow State University

andrey@cs.msu.su



ABSTRACT.

THE PARALLELIZATION OF COMPUTATIONAL ALGORITHMS IN LARGE PHYSICS
PROBLEMS ON MULTIPROCESSOR COMPUTER SYSTEM CONTAINING THOUSAND AND MORE OF
PROCESSOR ELEMENTS REQUIRES SPECIAL TOOLS. IN THE PRESENT WORK THE APPROACH
BASED ON THE USING OF GENETIC ALGORITHM (GA) IS PROPOSED. THE EXPERIMENT
RESULTS AND THE INFLUENCE OF GA PARAMETERS ON CONVERGENCE OF THE METHOD ARE
GIVEN. THE APPROACH MAY BE USING FOR REAL-TIME PARALLELIZATION.



1. Introduction.

THE SOLVING OF A PARALLELIZATION PROBLEM FOR MULTIPROCESSOR COMPUTER
SYSTEM CONTAINING THOUSAND AND MORE OF PROCESSOR ELEMENTS REQUIRES A HUGE
NUMBER OF OPERATIONS. THE PROBLEM IS NP-COMPLETE. THEREFORE THE TRADITIONAL
ALGORITHM OF EXHAUSTIVE SEARCH CANNOT BE APPLIED FOR THIS PURPOSE.

Genetic Algorithm (GA) [1] is the stochastic search technique, based on
ideas adopted from nature. GA has been successfully applied to solve many
NP-complete problems as Travelling Salesman Problem or mapping problem [5].

The existing methods do not allow exploring high - level parallelism or
taking into account performance of communication channels between processor
elements.

In the present work the approach based on the Genetic Algorithm is
proposed. The method may be applied to computer systems with a huge number
of processor elements for utilizing high - level parallelism as well as
operation - level parallelism.

The parallelization problem is formulated here as an optimisation of
partitioning graph to subgraphs.



2. Background.

THE CRITERION FOR DISTRIBUTION OF NODES BETWEEN SUBGRAPHS IS THE
MINIMIZATION OF THE ALGORITHM WORKING TIME ON THE SYSTEM AND THE TIME
REQUIRED FOR DATA TRANSFER BETWEEN PROCESSOR ELEMENTS. IN THIS WAY, EVERY
SUBGRAPH CORRESPONDS TO THE SOME PROCESSOR ELEMENT.

The computational algorithm can be represented as a weighted graph [3].
The function estimates operating time of the algorithm and data
transferring is determined by the given partition R. This functional is
described below.

A program is represented by an acyclic directed weighted graph
Ga=. The nodes of the graph are the statements of the algorithm and
the edges refer to the information exchange between them. C is a vector of
nodes' weight (calculation cost) and S is an adjacency matrix of the graph.
There is decomposition to layers for an acyclic graph. The nodes belonging
to the same layer can be carried out simultaneously. Let h is a number of
the layers. The decomposition to layers is introduced as matrix Hhxn where
hkj is 1 if j-th node belongs to k-th layer and 0 in other case.

A multiprocessor computer system is represented by a weighted graph
Gc=<(,(>. The nodes of the graph are processor elements and (i is the
performance of i-th processor element. There is an edge between i-th and j-
th nodes if there is a link between corresponding processors and (pxp is
communication performance matrix.

We assumed:

n is a number of the algorithm statements.

p is a number of processor elements in the computer.

Rpxn is a partition matrix, ril is equal 1 if node number l belongs to
i-th subgraph and 0 in other case.

We use matrix Ghxp where element gki is equal total weight of nodes
belonging to i-th subgraph and k-th layer in the graph of the algorithm:

[pic].

The cost of the data transfer between processor elements is defined by
matrix (pxp:

[pic].

It is possible now to define functional J(R):

[pic]

where:

(0 is a performance of the processor element.

Ua is a total weight of the graph of the algorithm.

Matrix Lpxp determines minimal distance between j-th and i-th
processors. If both communicating algorithm nodes are allocated on the same
processor the distance is zero.

For achieving maximum performance we need to find partition matrix R
that minimize J(R):

J(R)(min.

GA allows to carry out the optimisation process and build matrix R. The
J(R) is used as a fitness function.



3. Implementation details.

THE GENETIC ALGORITHM CAN BE CHARACTERIZED BY THE FOLLOWING PARAMETERS:

- coding strategy,

- population size,

- initialisation strategy,

- selection technique,

- crossover mechanism,

- mutation mechanism,

- strategy for offspring inclusion.

Below, we briefly describe techniques used in this implementation.

Coding strategy - The partition is encoded by a chromosome, represented
as a byte string, whose length is equal to the number of nodes in the
algorithm graph. A number coded on the i-th position (gene) in a chromosome
is a processor element number on which i-th node is placed.

Population size - for the algorithm presented in this paper, there are
no restrictions on a number of chromosomes in population.

Initialisation strategy - initial elements can be randomly chosen or a
modified weighted random designed.

Selection technique - two methods of choosing an individual for a
reproduction are used and compared:

- roulette wheel, the probability that the individual i with fitness
fi is used as parent is proportional to its relative fitness in the
population;

- linear rank selection, the probability that the individual i with
fitness fi is used as parent based on its rank in the population as
a whole.

Crossover mechanism - we consider three crossover operators:

- one-point crossover(1-PC), a pair of chromosomes representing the
selected individuals is intersected in a randomly chosen point and
right-hand genes are swapped between chromosomes;

- two-point crossover(2-PC), a pair of chromosomes representing the
selected individuals is intersected in two randomly chosen points
and genes lying between those two points are exchanged between
chromosomes;

- uniform crossover, for each gene, randomly picks each gene from
either of the two parent chromosomes.

Mutation mechanism - a node is moved to a randomly chosen subgraph or
two nodes exchange between subgraphs.

Strategy for an offspring inclusion - after each reproduction a
decision should be made whether or not an offspring should replace its
parent. We assume that the better of offspring and the better of parents
are included.



4. Results

THE METHOD WAS INVESTIGATED ON THE REAL PROBLEM OF SELF - CONSISTENT
SIMULATION OF NEGATIVE CENTRAL SHEAR DISCHARGES HAVING THE EXACT SOLUTION.
THE GRAPH HAS 1563 NODES.

All results are averaged over 100 runs. The functional E define
relative accuracy and is used for investigation of GA:
[pic]

where J - solution found by GA,

[pic] - exact solution of parallelization problem (calculated
analytically).

t -


Firstly the influence of the population size on the convergence of GA
was studied. The results are presented on the following figure. Figure 2
helps to choice population size for predefined precision. Bigger
populations show better results in high - precision case.

[pic]

Figure 1. An influence of the population size on the convergence of GA.


[pic]

Figure 2. An influence of the population size on the number of
iterations to achieve predefined precision (55%, 45%, 75%).


Mutation brings new information to the chromosome and preserve losing
diversity. Figures 3, 4 presents results of comparing different mutation
schemes. The random movement of nodes shows better results than exchange
nodes between subgraphs.

[pic]

Figure 3. The method convergence for different mutation schemes.


[pic]

Figure 4. The method convergence for different mutation rate.


After that, the selection strategies were compared (Figure 5). Rank and
Tournament selection schemes demonstrate the best results. GA with rank
selection converges to exact solution slightly faster.

[pic]

Figure 5. A comparison of selection strategies.


In our last experiment we analysed an influence of the crossover
operator on the method performance. Experiment was done with a crossover
probability varying from 0 to 1.0 and a mutation rate varying from 0.001 to
0.5. Figure 6 shows results of this comparison. Each operator was applied
with the probability that gave best results. With uniform crossover results
are better than with other operators.

[pic]

Figure 6. A comparison of crossover operators.
[pic]

Figure 7. Performance of algorithm for different crossover rates.



Conclusion.

AN APPROACH FOR THE SOLUTION OF THE PARALLELIZATION PROBLEM ON THE
BASIS OF GENETIC ALGORITHM IS SUGGESTED. NUMERICAL STUDY OF THIS METHOD WAS
CARRIED OUT. GA SHOWED FAST CONVERGENCE AND MAY BE USED FOR FOR REAL-TIME
PARALLELIZATION.



References.

[1] D.E. GOLDBERG, GENETIC ALGORITHMS IN SEARCH, OPTIMIZATION & MACHINE
LEARNING, ADDISON-WESLEY PUBLISHING COMPANY, 1989

[2] Nikitin A.V. Optimization of Modular Associative Memory //
Computational mathematics and modeling. 1999. vol 10. N. 4. p. 405-412

[3] N.M. Ershov, A.M. Popov, Optimization of Parallel Computations by
Monte-Carlo Method, In: Proceedings of PaCT-93, Obninsk, Russia, 1993,
Vol.3

[4] G. von Laszewski, Intelligent Structural Operators for the K-way
Graph Partitioning Problem, In: Proceedings of the 4-th ICGA, 1991

[5] T. Kalinowski, Solving the Mapping Problem with Genetic Algorithm
on the MasPar-1, In: Proceedings of MPCS, Ischia, Italy, 1994
-----------------------
E

t

Number of of iterations

Population size

E

t

t

E






t

E


E


t

t

E