Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://theory.sinp.msu.ru/dokuwiki/doku.php?id=egee:gt4:gt4instnotes
Äàòà èçìåíåíèÿ: Unknown
Äàòà èíäåêñèðîâàíèÿ: Mon Apr 11 05:49:16 2016
Êîäèðîâêà: ISO8859-5
egee:gt4:gt4instnotes [THEP]

GT4/Condor/WSS integration notes

This page contains notes on the GT4/Condor integration. Tested with Scientific Linux 3.0.5.

RPMs

Install the RPMs

echo rpm http://lhc.sinp.msu.ru/dist LCG-2_6_0 lcg_sl3 lcg_sl3.updates > /etc/apt/sources.list.d/lcg26.list
echo rpm http://grid-deployment.web.cern.ch/grid-deployment/gis apt/LCG_CA/en/i386 lcg > /etc/apt/sources.list.d/lcg-ca.list
apt-get update
apt-get install gcc rh-postgresql-jdbc rh-postgresql-server ca_Russia ca_RDIG

Java

There are two alternative ways.

Use Sun Java RPMS

Install the RPMS:

apt-get j2sdk ant

Create the required links:

cd /usr/java/j2sdk1.4.2_08/man/man1
gzip -9 jar.1
cd /etc/alternatives
ln -fs /usr/java/j2sdk1.4.2_08/man/man1/jar.1.gz
ln -fs /usr/java/j2sdk1.4.2_08/bin/jar
ln -sf /usr/java/j2sdk1.4.2_08/bin/javac
ln -sf /usr/java/j2sdk1.4.2_08/bin/java

Use the jpackage.org RPMS

Add these sources to the apt/sources.list.d/jpackage.list:

rpm http://sunsite.informatik.rwth-aachen.de/ftp/pub/Linux/jpackage 1.6/redhat-el-3.0 devel free
rpm http://sunsite.informatik.rwth-aachen.de/ftp/pub/Linux/jpackage 1.6/generic devel free non-free

Install java:

apt-get update
apt-get install java-1.5.0-sun ant

Users and groups

Users and groups for the GT4 toolkit

groupadd -g 110 globus
useradd -u 110 -g 110 -s /bin/bash -c "Globus toolkit user" globus

Test user to be mapped

useradd -G gridmapped testuser

SSH Host-based authentication

Create the /etc/ssh/ssh_known_hosts file. You can use the script like this:

rm -f /etc/ssh/ssh_known_hosts
for host in lcg05 lcg07 lcg09 ; do 
  echo $host,`host $host | cut -d ' ' --output-delimiter=, -f 1,4` ssh-dss `ssh-keyscan -t dsa $host 2>/dev/null | cut -d ' ' -f 3` >> /etc/ssh/ssh_known_hosts
  echo $host,`host $host | cut -d ' ' --output-delimiter=, -f 1,4` ssh-rsa `ssh-keyscan -t rsa $host 2>/dev/null | cut -d ' ' -f 3` >> /etc/ssh/ssh_known_hosts
done

Create the /etc/ssh/shosts.equiv file.

cat > /etc/ssh/shosts.equiv <<EOF
lcg05.sinp.msu.ru
lcg07.sinp.msu.ru
lcg09.sinp.msu.ru
EOF

Enable HostbasedAuthentication for the sshd. Edit the file /etc/ssh/sshd_config and add ò??HostbasedAuthentication yesò??. You may use this script with the ssh from SL 3.0.5:

cd /etc/ssh
patch -l -p0 <<EOF
--- sshd_config.orig    2005-09-13 11:10:01.000000000 +0400
+++ sshd_config 2005-09-13 11:10:08.000000000 +0400
@@ -48,7 +48,7 @@
 # For this to work you will also need host keys in /etc/ssh/ssh_known_hosts
 #RhostsRSAAuthentication no
 # similar for protocol version 2
-#HostbasedAuthentication no
+HostbasedAuthentication yes
 # Change to yes if you don't trust ~/.ssh/known_hosts for
 # RhostsRSAAuthentication and HostbasedAuthentication
 #IgnoreUserKnownHosts no
EOF

Make the HostbasedAuthentication to be default for the ssh clients. Edit /etc/ssh/ssh_config and add

HostbasedAuthentication yes
EnableSSHKeysign yes

to the ò??Host *ò?? section. You may use this script with the ssh from SL 3.0.5:

cd /etc/ssh
patch -l -p0 <<EOF
--- ssh_config.orig     2005-09-13 11:18:31.000000000 +0400
+++ ssh_config  2005-09-13 11:21:52.000000000 +0400
@@ -36,3 +36,5 @@
 #   EscapeChar ~
 Host *
       ForwardX11 yes
+       HostbasedAuthentication yes
+       EnableSSHKeysign yes
EOF

Restart the sshd:

service sshd restart

Torque (PBS)

First configure ssh hostbased authentication, see the appropriate section of this document.

Server node

Install the torque on the server node.

tar xvfz torque-1.2.0p6.tar.gz
cd torque-1.2.0p6
./configure --disable-mom --disable-gui --set-server-home=/usr/local/spool/PBS --enable-syslog --with-scp
make
make install
./torque.setup root
qterm -t quick

Configure the server_name:

cd /usr/local/spool/PBS
echo lcg09.sinp.msu.ru > server_name

Add the list of the nodes to the nodes file:

cd /usr/local/spool/PBS
cat > server_priv/nodes <<EOF
lcg05.sinp.msu.ru np=2
EOF

Configure the nodes and then start the server and scheduler:

pbs_server
pbs_sched

Worker nodes

Install the torque on the worker nodes.

tar xvfz torque-1.2.0p6.tar.gz
cd torque-1.2.0p6
./configure --disable-server --disable-gui --set-server-home=/usr/local/spool/PBS --enable-syslog --with-sc
make
make install

Configure the pbs_mom:

cd /usr/local/spool/PBS
echo lcg09.sinp.msu.ru > server_name
cat > mom_priv/config <<EOF
\$clienthost 213.131.5.9
\$logevent 255
\$restricted 213.131.5.9
EOF

Start the pbs_mom:

pbs_mom

Shared-filesystem configuration

This is an optional configuration of the PBS with the shared homes. Install and configure the YP server:

apt-get install ypserv
echo NISDOMAIN=gt4farm >> /etc/sysconfig/network
chkconfig --level 345 ypserv on
/etc/init.d/ypserv start
make -C /var/yp

Configure and start the NFS server:

echo "/home lcg*.sinp.msu.ru(rw,no_root_squash,sync)" >> /etc/exports
chkconfig --level 345 nfs on
/etc/init.d/nfs start

The server is now ready, and we should configure the nodes.

apt-get install ypbind
echo NISDOMAIN=gt4farm >> /etc/sysconfig/network
chkconfig --level 345 ypbind on
/etc/init.d/ypbind start
echo "lcg09.sinp.msu.ru:/home /home                   nfs     defaults        0 0" >> /etc/fstab
mount /home

GT4 Installation and configuration

Directories

mkdir /usr/local/globus-4.0.1
chown globus:globus /usr/local/globus-4.0.1

GT4 installation

Become a globus user. If you use Sun Java do

export JAVA_HOME=/usr/java/j2sdk1.4.2_08
export JAVAC_PATH=/usr/java/j2sdk1.4.2_08/bin/javac

Extract the installation tarball and cd into the install directory, setup the environment:

tar xfj gt4.0.1-all-source-installer.tar.bz2
cd gt4.0.1-all-source-installer
export GLOBUS_LOCATION=/usr/local/globus-4.0.1

With PBS do

export PBS_HOME=/usr/local/spool/PBS
./configure --prefix=$GLOBUS_LOCATION --enable-wsgram-pbs
make 2>&1 | tee build.log
make install

Without PBS do

./configure --prefix=$GLOBUS_LOCATION
make 2>&1 | tee build.log
make install

Certificates

Obtain the certificates for the host and place them into the /etc/grid-security/hostcert.pem and hostkey.pem. Create a copy of certs for the GT4 container and set correct permissions.

chmod 400 hostkey.pem
chmod 644 hostcert.pem
cp hostcert.pem containercert.pem
cp hostkey.pem containerkey.pem
chown globus:globus container*.pem

Environment

Create two shell profile scripts, uncomment lines with java if you used Sun rpms.

/etc/profile.d/globus.sh:

#export JAVA_HOME=/usr/java/j2sdk1.4.2_08
#export JAVAC_PATH=/usr/java/j2sdk1.4.2_08/bin/javac
export PBS_HOME=/usr/local/spool/PBS
export GLOBUS_LOCATION=/usr/local/globus-4.0.1
. $GLOBUS_LOCATION/etc/globus-user-env.sh

/etc/profile.d/globus.csh:

#setenv JAVA_HOME /usr/java/j2sdk1.4.2_08
#setenv JAVAC_PATH /usr/java/j2sdk1.4.2_08/bin/javac
setenv PBS_HOME /usr/local/spool/PBS
setenv GLOBUS_LOCATION /usr/local/globus-4.0.1
source $GLOBUS_LOCATION/etc/globus-user-env.csh

Configure GridFTP

Create entries in /etc/services:

gridftp         2811/tcp
gridftp         2811/udp

Create config /etc/grid-security/gridftp.conf:

port 2811
allow_anonymous 0
inetd 1

Create xinetd service config /etc/xinetd.d/gridftp:

service gridftp
{
        instances       = 100
        socket_type     = stream
        wait            = no
        user            = root
        env             += GLOBUS_LOCATION=/usr/local/globus-4.0.1
        env             += LD_LIBRARY_PATH=/usr/local/globus-4.0.1/lib
        server          = /usr/local/globus-4.0.1/sbin/globus-gridftp-server
        server_args     = -i
        log_on_success  += DURATION
        nice            = 10
        disable         = no
}

Reload the xinetd service:

service xinetd reload

Configure PostgreSQL

Start postgresql and turn it on for autostart:

chkconfig --level 345 rhdb on
service rhdb start

Edit /var/lib/pgsql/data/pg_hba.conf, add lines:

host   all      all             127.0.0.1          255.255.255.255   password
host   all      all             213.131.5.7        255.255.255.255   password

Edit /var/lib/pgsql/data/postgresql.conf, uncomment lines:

tcpip_socket = true

Restart the posgresql service:

service rhdb restart

Fill the RFT databases and create postgres globus user, use password athdavRi when prompted:

su - postgres -c "createuser -A -D -P -E globus"
su - postgres -c "createdb -O globus rftDatabase"
su - postgres -c "psql -U globus -h localhost -d rftDatabase -f $GLOBUS_LOCATION/share/globus_wsrf_rft/rft_schema.sql"

Configure RFT

Edit the file $GLOBUS_LOCATION/etc/globus_wsrf_rft/jndi-config.xml and change the posgres database password to one used in postgres configuration step.

Configure GRAM

Create the group for the gridmapped users:

groupadd -g 31337 gridmapped

Edit /etc/group and add all the users which are grid-mapped to this group. Configure the sudo, run visudo and add these lines to the end:

# Globus GRAM entries
globus  ALL=(%gridmapped) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-job-manager-script.pl *
globus  ALL=(%gridmapped) NOPASSWD: /usr/local/globus-4.0.1/libexec/globus-gridmap-and-execute -g /etc/grid-security/grid-mapfile /usr/local/globus-4.0.1/libexec/globus-gram-local-proxy-tool *

Setup the PBS jobmanager to use ssh, if configured with PBS support:

cd $GLOBUS_LOCATION/setup/globus
./setup-globus-job-manager-pbs --remote-shell=ssh

Don't forget to configure host-based authentication for SSH as described in the Torque/PBS section.

grid-mapfile entries

Add new entries to the grid-mapfile:

grid-mapfile-add-entry -dn "/C=RU/O=DataGrid/OU=sinp.msu.ru/CN=Lev Shamardin" -ln testuser

Start the container

During debug it is recommended to start the container like this:

touch /var/log/globus-container.log
chown globus:globus /var/log/globus-container.log
/usr/local/globus-4.0.1/bin/globus-start-container > /var/log/globus-container.log 2>&1 &

Condor installation and configuration

Install the Condor rpm:

rpm -ivh condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm

Create the condor user and reconfigure condor with the new user:

useradd -u 120 -g users -c "Condor user" condor
cd /opt/condor-6.7.10
./condor_configure --install-dir=/opt/condor-6.7.10 --owner=condor --type=submit,execute,manager --verbose

Edit /opt/condor-6.7.10/etc/condor_config if required.

Create the condor shell profile scripts.

/etc/profile.d/condor.sh:

export CONDOR_CONFIG=/opt/condor-6.7.10/etc/condor_config
if [ `id -u` = 0 ] ; then
  export PATH="$PATH:/opt/condor-6.7.10/bin:/opt/condor-6.7.10/sbin"
else
  export PATH="$PATH:/opt/condor-6.7.10/bin"
fi

/etc/profile.d/condor.csh

setenv CONDOR_CONFIG "/opt/condor-6.7.10/etc/condor_config"
if ( `id -u` == 0 ) then
  set path = ( $path /opt/condor-6.7.10/bin /opt/condor-6.7.10/sbin )
else
  set path = ( $path /opt/condor-6.7.10/bin )
endif

Create the scratch directories: on each execution host in user's home run

mkdir -p $HOME/.globus/scratch

LCMAPS

Create the pooled users accounts and populate the gridmapdir:

wget http://www-unix.mcs.anl.gov/~tfreeman/local/pooled/admin/addpoolusers.sh
patch -p0 -l <<EOF
--- addpoolusers.sh.orig        2005-09-22 22:37:00.000000000 +0400
+++ addpoolusers.sh     2005-09-22 22:37:29.000000000 +0400
@@ -8,11 +8,11 @@
 # Andrew McNab <mcnab@hep.man.ac.uk>  March 2001
 #
 
-startUID=9000           # start UID of first user
-  endUID=9010           # UID of last user - no more than startUID+999
+startUID=2000           # start UID of first user
+  endUID=2010           # UID of last user - no more than startUID+999
    group=users          # group to assign all pool users to
-  prefix=gpool          # prefix, eg gpool000, gpool001, ...
-homedirs=/home/gpool    # where to make the home directories
+  prefix=mapped         # prefix, eg gpool000, gpool001, ...
+homedirs=/home/mapped   # where to make the home directories
 
 ########## You dont need to edit anything below this line    #########
 ########## but you should make sure you understand it before #########
EOF
mkdir /home/mapped
mkdir -p /etc/grid-security/gridmapdir
sh addpoolusers.sh

Install the LCMAPS build environment:

apt-get install cvs automake autoconf libtool bison flex openldap-devel

Install the latest LCMAPS version from the CVS to /opt/lcmaps:

cd /var/tmp
mkdir egee && cd egee
export CVSROOT=":pserver:anonymous@jra1mw.cvs.cern.ch:/cvs/jra1mw"
cvs co org.glite org.glite.security
cvs co -r glite-security-lcmaps-1_3_1-multiple-accounts org.glite.security.lcmaps \
  org.glite.security.lcmaps-interface org.glite.security.lcmaps-plugins-basic \
  org.glite.security.lcmaps-plugins-voms
for i in lcmaps lcmaps-interface lcmaps-plugins-basic lcmaps-plugins-voms ; do \
  cp org.glite.security/project/*.m4 org.glite.security.$i/project; \
  cp org.glite/project/*m4 org.glite.security.$i/project; \
done
export LSTAGEDIR=/opt/lcmaps
for i in lcmaps lcmaps-interface lcmaps-plugins-basic lcmaps-plugins-voms ; do
  (cd org.glite.security.$i;
  make distclean;
  ./bootstrap;
  ./configure  --prefix=$LSTAGEDIR --without-gsi-mode;
  make install);
done

Workspace Service

(Does not work yet)

Download and deploy the Workspace service. Before deploying edit the $WORKSPACE_HOME/service/java/source/deploy-jndi-config.xml and set the path to the lcmaps conf file to /etc/grid-security/lcmaps-wss.conf. You must deploy the service as a globus user whith the globus container not running:

wget http://www-unix.mcs.anl.gov/workspace/workspaceService_tech_preview_4_1.tgz
tar xvzf workspaceService_tech_preview_4.tgz
cd workspaceService
export WORKSPACE_HOME=`pwd`
wget http://www.mcs.anl.gov/workspace/glite-security-util-java.jar
mv glite-security-util-java.jar $WORKSPACE_HOME/service/java/source/lib
source $GLOBUS_LOCATION/etc/globus-devel-env.sh
cd $WORKSPACE_HOME
vi $WORKSPACE_HOME/service/java/source/deploy-jndi-config.xml
ant deploy

Add this line to the $GLOBUS_LOCATION/container-log4j.properties:

log4j.category.org.globus.workpsace=INFO

LCMAPS backend

First build and install LCMAPS libraries. After that cd to $WORKSPACE_HOME/local/lcmaps/source and edit the Makefile, change the paths to the LCMAPS includes and libraries. In our case:

patch -p0 -l <<EOF
--- Makefile.orig       2005-09-23 12:58:50.000000000 +0400

+++ Makefile    2005-09-23 14:27:11.000000000 +0400
@@ -1,7 +1,7 @@
 MYCFLAGS = -g -Wall -O1
 EXEC = lcmaps_poolindex
-MYINCS = -I/opt/glite/include/glite/security/lcmaps_without_gsi/ 
-MYLIBDIRS = -L/opt/glite/lib
+MYINCS = -I/opt/lcmaps/include/glite/security/lcmaps_without_gsi
+MYLIBDIRS = -L/opt/lcmaps/lib
EOF

Build and install the lcmaps_poolindex binary:

make install

As root add path to lcmaps libs (/opt/lcmaps/lib) to the ld.so.conf and regenrate the ld cache:

echo "/opt/lcmaps/lib" >> /etc/ld.so.conf
ldconfig

Install the lcmaps config file with the correct permissions (do it as root):

install -o globus -g globus -m 0644 $WORKSPACE_HOME/local/lcmaps/source/lcmaps_poolindex.conf /etc/grid-security/lcmaps-wss.conf

edit the /etc/grid-security/lcmaps-wss.conf, you should change at least LCMAPS_LOG_FILE and LCMAPS_DB_FILE. We recommend to place the LCMAPS_DB_FILE to /etc/grid-security/lcmaps/lcmaps.db.without_gsi, you should create this file and directory of course:

install -o root -g root -m 0755 -d /etc/grid-security/lcmaps

Example lcmaps.db.without_gsi (must be owned by root):

# LCMAPS policy file/plugin definition
#
# default path
path = /opt/lcmaps/lib/modules

# Plugin definitions:
good             = "lcmaps_dummy_good.mod"
bad              = "lcmaps_dummy_bad.mod"
localaccount     = "lcmaps_localaccount.mod -gridmapfile /etc/grid-security/grid-mapfile"
vomslocalgroup   = "lcmaps_voms_localgroup_without_gsi.mod -groupmapfile /etc/grid-security/groupmapfile -mapmin 0"
vomspoolaccount  = "lcmaps_voms_poolaccount_without_gsi.mod -gridmapfile /etc/grid-security/grid-mapfile -gridmapdir /etc/grid-security/gridmapdir -do_not_use_secondary_gids"

posixenf = "lcmaps_posix_enf.mod -maxuid 1 -maxpgid 1 -maxsgid 32 "
poolaccount = "lcmaps_poolaccount.mod  -gridmapfile /etc/grid-security/grid-mapfile -gridmapdir /etc/grid-security/gridmapdir/ -override_inconsistency"

# Policies:
das_voms:
localaccount -> good | poolaccount
poolaccount -> good

Seems that the lcmaps_poolindex executable actually has to be setuid, at least we didn't manage to make it work from the globus account without being setuid. So, set the right permissions on the lcmaps_poolindex binary:

chmod 04711 $GLOBUS_LOCATION/bin/lcmaps_poolindex
chown root:root /usr/local/globus-4.0.1/bin/lcmaps_poolindex

Workspace authorization

Edit (or create) the file $GLOBUS_LOCATION/etc/workspace_service/dn-authz, add your users to this file.

We are not going to rebuild the GRAM without gridmapfile support, since this is a test setup. So just edit these files and change the <authz value=ò??gridmapò??/> to the appropriate value:

  • $GLOBUS_LOCATION/etc/globus_delegation_service/factory-security-config.xml
  • $GLOBUS_LOCATION/etc/globus_wsrf_rft/factory-security-config.xml
  • $GLOBUS_LOCATION/etc/gram-service/managed-job-factory-security-config.xml
<authz value="gram:org.globus.workspace.QueryPDP"/>

Known issues

  • Security issue: quote from the GT4 admin guide: ò??WSRF-based components ignore the signing policy file and will honor all valid certificates issued by trusted CAsò??.
  • Condor gt4 adapter assumes that there is a directory $HOME/.globus/scratch for the user under which permissions the job is executed, but never tries to create it. It should be created manually.
  • GT4 seems to be unstable with Torque, the jobs can hang in the ò??unsubmittedò?? state (see this thread for more details). There seems to be a solution to this problem, but it is not well proven to work (see this message).

Condor with GT4 job submission

Since I have no access to condor sources all of the following is just my own deductions and speculations.

Condor-C GT4 gahp helper assumes that:

  • The gridftp server is run on both Condor-C submission host and GT4 gatekeeper host.
  • The user's certificate on the Condor-C submission host is mapped to the submitting user.
  • The user's certificate on the GT4 host may be mapped to any user.

These assumptions mentioned above lead to the impossibility to submit a job to the GT4 from the Condor running on the same host if the user submitting the job is mapped to another user in the grid-mapfile. However there are (?) some workarounds that will be described in the ò??Dirty tricksò?? section.

Dirty tricks

egee/gt4/gt4instnotes.txt ÒÇ Last modified: 23/092005 18:20 (external edit)
CC Attribution-Share Alike 3.0 Unported
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0