Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://hpc.msu.ru/files/hpcmsu/downloads/X-Com_tutorial.pdf
Äàòà èçìåíåíèÿ: Sat Jun 16 01:49:09 2012
Äàòà èíäåêñèðîâàíèÿ: Mon Oct 1 19:26:54 2012
Êîäèðîâêà:
X-Com: Distributed Computing Software
Research Computing Center M.V. Lomonosov Moscow State University

HPC.MSU.RU


X-Com key features (1)
· X-Com takes in account all unique characteristics of distributed computational environments:
­ large scale
· thousands CPUs

­ geographically distributed resources
· very high latency in communications

­ variable set of available resources
· use node on connection · don't fail on node disconnection

­ heterogeneous resources:
· operating system (MS Windows, Linux, ...) · CPU (x86, x86_64, ia64, ...)


X-Com key features (2)
· Lightweight portable toolkit:
­ easy to install
· Perl-based software · doesn't need administrator/root access

­ easy applications adaptation
· maybe several Perl strings must be written

­ simultaneous work of different platforms
· joining Windows and Linux machines

­ distributed freely
· BSD license


X-Com basic requirements
· Linux / Windows · Perl 5.8.0+
­ for Windows:
· Strawberry Perl (preferred) · ActivePerl

· Open TCP port for communications
­ if firewall is used

· Disabled CPU time limits
­ if used on head machine of computing cluster


Structure of application in X-Com distributed environment
· Application:
­ ability to split a task into a number of independent subtasks (portions) ­ more computing, less communicating

· X-Com ­ clients & server
­ server is responsible for splitting tasks into portions and joining results ­ clients receive data from server, perform computations, send results back


General idea of X-Com computing
Server part of the application
Server API

X-Com server

Internet

X-Com clients
Client API

Client (computing) part of the application


X-Com server APIs
· Files API
­ for quite simple applications ­ only need to specify input/output paths and a file list

· Perl API
­ for more complicated applications ­ need to create 6 functions in a Perl module


X-Com server APIs: Files
· Options of the server settings file (ini-file):
­ taskList
· input files list

­ taskIn
· path where input files are stored

­ taskOut
· output path


X-Com server APIs: Perl
· Functions to be write in a task server module:
­ initialize ($taskArg)
· performs starting actions, e.g. arguments checking

­ getFirstPortionNumber ()
· returns number of the 1st portion (usually returns 1)

­ getLastPortionNumber ()
· returns number of the last portion (if we know total number of portions)

­ isFinished ()
· checks finishing conditions and returns true if the work is over (if we don't know total portions number)

­ getPortion ($N)
· returns content of Nth portions

­ addPortion ($N, $data)
· processes the result of Nth portions which is stored in $data

­ finalize ()
· makes any final actions


X-Com client APIs
· Processes API
­ only need to specify command lines for initializing and portion processing

· Perl API
­ need to create 2 Perl functions


X-Com client APIs: Processes
· Options of the server settings file (ini-file):
­ xCliInitCmd
· command line which will be run once before portions processing

­ xCliPortionCmd
· command line used for processing each portion

­ xCliPortionIn
· file name in which incoming portions will be stored

­ xCliPortionOut
· file name from which resulting data well be taken


X-Com client APIs: Perl
Functions to be write in a client module:
­ gcprepare ($addarg)
· function called once before any portion-processing actions, i.e. client part initializing
­ $addarg is an additional parameter that can be passed to the XCom client from command line

­ gctask ($task, $taskarg, $portion, $din, $dout, $addarg)
· function called for every portion
­ ­ ­ ­ ­ $task - task name $taskarg - task arguments $portion - portion number $din - file with portion content $dout - file from which results well be read


X-Com computing sequence using Perl API on both server and clients
1. Task initializing on server side Server part of the application 2. Task description request 3. Request for required files 4. Task initializing on clients 5. Portion request 6. Computations 7. Result sending 8. Finalizing
initialize() getPortion() addPortion() finalize()

X-Com server
T SR TFILE REQ ASW

X-Com client
gcprepare() gctask()

Client part of the application


X-Com files structure at a glance
· gserv
­ ­ ­ ­ XServ.pl *.pm *.ini logs

- X-Com server
executable of the server server components ini-files of server log-files saving path Pi calculation application (sample task) server part interface of Pi app. client part interface of Pi app. packed client parts

· tasks
­ Pi
· Pi.pm · gctask · *.tar.gz

- applications

· gcli
­ client.pl ­ *.pm

- X-Com client
- executable of the client - client components

· utils
­ XLogProc.pl

- utilities
- log-files analyzing tool


Simple example: distributed grep. X-Com server settings & running
# dgrep.ini

# Server info
prSock ssHost ssPort # Task info ta ta ta ta ta ta sk sk sk sk sk sk Na AP Ar Li In Ou me I gs st t = = = = = = dgrep Files 212.19 /tmp/a /tmp/a /tmp/a
# task name # task API

= /tmp/xcom.sock = newserv.srcc.msu.ru = 65002

# internal UNIX socket, need for server # external TCP interface for clients ­ host... # ... and port

2. lo lo lo

24 gs gs gs

4 .in/in.txt .in .out

# arguments (a substring to search in files) # files list # input path (Apache log-files, by day) # output path

$ ./XServ.pl dgrep.ini


Simple example: distributed grep. Client part functions & running
sub gcprepare { return 1; } sub gctask { my ($task, $taskarg, $portion, $din, $dout) = @_; my ($file, $pattern) = split (' ', $taskarg); `/bin/grep '$pattern' $din > $dout`; return 1; } 1;

$ ./client.pl ­s http://newserv.srcc.msu.ru:65002


Another example: Pi calculation using Monte Carlo method
X 1

Let Xn = (x1,y1), (x2,y2), ..., (xn,yn) are n pairs of independent random values with normal distribution
Let KXn is a number of pairs for which (xi2+yi2) < 1

0

1

Y Than 4KXn/n , n

Accuracy with quite large n: up to i-th sign after dot, where i = [-log10(2/(3n)½)]-1


Pi calculating: server part functions (1) ~/xcom/tasks/Pi/Pi.pm
sub initialize { ($portionsCount,$expo) = split (/ /, $_[0]) or die ("Pi error: incorrect arguments!"); $totalSum = 0; $donePortions = 0; $outCSV = "../tasks/Pi/out/pi_esteem_$portionsCount"."_$expo.csv"; $outRes = "../tasks/Pi/out/pi_$portionsCount"."_$expo.txt"; open E, "> $outCSV" or die ("Pi error: can't write to $outCSV!"); print E 'Portions;Drops;EstimatedPi;RealPi;Error'."\n"; close E; print STDERR "Pi initialized: $portionsCount portions, ".(10**$expo)." (10**$expo) drops in portion\n"; return 1; } sub getFirstPortionNumber { return 1; } sub getLastPortionNumber { return $portionsCount; }


Pi calculating: server part functions (2) ~/xcom/tasks/Pi/Pi.pm
sub getPortion { my ($portion) = @_; return $portion; }

sub addPortion { my ($n, $portionSum) = ($_[0], $_[1]); $totalSum = $totalSum + $portionSum; $donePortions += 1; my $dropsCount = $donePortions * (10 ** $expo); $thePI = $totalSum / $dropsCount; open E, ">> $outCSV" or die ("Pi error: can't write to $outCSV!"); my $lc_numeric = setlocale(LC_NUMERIC); setlocale(LC_NUMERIC,'ru_RU'); print E "$donePortions;$dropsCount;$thePI;$PI;". ($thePI-$PI)."\n"; setlocale(LC_NUMERIC,$lc_numeric); close E; }


Pi calculating: server part functions (3) ~/xcom/tasks/Pi/Pi.pm
sub isFinished { return 0; } sub finalize { open F, "> $outRes" or die ("Pi error: can't write to $outRes"); print F $thePI."\n"; close F; return 1; }


Pi calculating: client part functions ~/xcom/tasks/Pi/gctask
sub gcprepare { return 1; } sub gctask { my ($task,$taskarg, my ($n,$expo) = spl my $cmd = "Pi $expo $cmd = "./$cmd" if my $s = readpipe($c if ($s==-1) { return 0; } open (F,"> $dout"); print F "$s"; close (F); return 1; }

$portion,$din,$dout) = @_; it (/ /, $taskarg); "; ($^O ne 'MSWin32'); md);


Pi calculating: X-Com server settings ~/xcom/gserv/Pi.ini
# Pi.ini

# Server info
prHost prPort ssHost ssPort # Task info taskName taskAPI taskArgs = Pi = Perl = 100 9 = = = = 212.192.244.31 65000 212.192.244.31 65001
# host-port pair for internal server purposes... # ... that can be used instead of UNIX socket # external (client) interface: host... # ... and port

# 100 portions, 10

9

random drops in each


X-Com server console: starting ~/xcom/gserv
$ ./XServ.pl Pi.ini XServ: Proces Subser Downlo Task A Task n Task I Task a starting sor: vers: ad prefix: PI: ame: D: rguments: 212. 212. {htt Perl Pi 1 100 192.244.31:6500 0 192.244.31:6500 1 p://212.192.244.31:6500 1/}

9

Pi initialized: 100 portions, 1000000000 (10**9) drops in portion XServProcessor: TestPi initialized (1 -> 100)


X-Com server console: basic monitoring of computing process
P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> P> ... msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. msk. skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ skif_ mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod mgu.nod e e e e e e e e e e e e e e e e e e e e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04.3: 04.1: 03.4: 04.4: 04.2: 03.2: 03.1: 03.3: 03.4: 03.1: 03.2: 04.2: 04.4: 04.1: 03.3: 04.3: 03.4: 03.1: 03.4: 03.1: T T T T T T T T R R R R R R R R A A R R SR SR SR SR SR SR SR SR EQ EQ EQ EQ EQ EQ EQ EQ SW SW EQ EQ

1 2 3 4 5 6 7 8 1 2 9 10


Web-based monitoring
http:///


X-Com server console: finishing
... P> msk.skif_mgu.node-33-08.1: ASW P> msk.skif_mgu.node-33-08.1: REQ P> msk.skif_mgu.node-33-03.1: ASW P> msk.skif_mgu.node-33-03.1: REQ P> msk.skif_mgu.node-33-05.1: ASW P> msk.skif_mgu.node-33-05.1: REQ P> msk.skif_mgu.node-33-07.1: ASW P> msk.skif_mgu.node-33-07.1: REQ P> msk.skif_mgu.node-33-06.1: ASW P> msk.skif_mgu.node-33-06.1: REQ P> msk.skif_mgu.node-33-02.1: ASW P> Finishing communications... XServSubserver: cannot connect to XServSubserver: cannot connect to Killed $ 96 97 97 98 98 99 99 100 95 100 100 212.192.244.31:65000, retrying (1)... 212.192.244.31:65000, retrying (2)...


Analysis of the finished calculations
~/xcom/utils/XLogProc.pl


X-Com additional features: model (testing) task
· The task aimed to model real applications for preliminary assessment of environment characteristics and application behavior · Parameters:
­ size of input/output data ­ time of portion processing ­ computational modules


X-Com additional features: proxy (buffering) servers
· Unlimited levels of hierarchy · Data (portions and files) buffering
­ less network connections ­ less traffic consumption

· Optimization of server load


X-Com additional features:
XQServ - task management subsystem
· High-level user interface
­ almost as common batch system
Web Browser XQServ Client XQServ Server

· Multiuser and multitasking

X-Com Server X-Com Server

X-Com Server

· Tasks scheduling:
­ consecutive queue ­ parallel running ­ using specified task conditions:
· CPU type, frequency, performance · OS type · RAM size

X -C o m Server Clients Ma n a g e r

X-Com Server

: Linux CPU: x86_64

: Linux CPU: x86

: MSWin32 CPU: x86_64

C PU 4 _6 86 :x


X-Com additional features: working on computing clusters
· Exclusive nodes using · Using nodes idle time · Using batch systems:
­ ­ ­ ­ Torque LoadLeveler Cleo Unicore


X-Com additional features: Flash-based computing visualization


Real distributed computing practice (1)
Computing of matrix coefficients for electromagnetic field diffraction on homogeneous dielectric free shape objects (together with Penza State University) Computing resources ­ 6 HPC sites:
­ Moscow, Chelyabinsk, Ufa, Sankt-Petersburg, Novosibirsk, Tomsk
· Linux OS · Intel Xeon, Intel Itanium, AMD Opteron CPUs · exclusive using of nodes

­ 2199 cores ­ 24.5 TFlops

Task splitting:
­ 280525 portions, about 5 minutes processing each

Time:
­ total ­ 11 hours 46 minutes ­ CPU ­ 2.7 years

Efficiency:
­ server ­ 98.81% ­ environment ­ 98.77%


Real distributed computing practice (2)
Virtual screening, i.e. searching for inhibitors for cancer proteins (together with "Molecular Technologies" company)
Source:
­ 41 tasks, 318 portions in each ­ average time of portion processing - 6 hours

Computing resources ­ SKIF MSU "Chebyshev":
­ simultaneous processing of several tasks ­ 160 cores (40 nodes) per each task ­ working together with Cleo batch system

Time:
­ total ­ 7.5 days ­ CPU ­ 10.6 years


Contacts
· http://X-Com.parallel.ru (in Russian)
­ latest versions:
· http://x-com.parallel.ru/download/xcom2.tar.gz · http://x-com.parallel.ru/download/xcom2.zip

· x-com@parallel.ru