Cluster Installation: Difference between revisions

Created blank page
 
run xds in a container
 
(46 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The following does ''not'' refer to the  [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system!


XDS can be run in a cluster using any batch job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.
== setup of XDS for a batch queue system ==
In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to use qsub instead of ssh. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed for the specific environment and queueing system.
<pre>
# forkxds
#!/bin/bash
#                    forkxds          Version DLS-2017/08
#
# enables  multi-tasking by splitting the COLSPOT and INTEGRATE
# steps of xds into independent jobs. Each job is carried out by
# a Fortran main program (mcolspot, mcolspot_par, mintegrate, or
# mintegrate_par). The jobs are distributed among the processor
# nodes of the NFS cluster network.
#
# 'forkxds' is called by xds or xds_par by the Fortran instruction
# CALL SYSTEM('forkxds ntask maxcpu main rhosts'),
#    ntask  ::total number of independent jobs (tasks)
#  maxcpu  ::maximum number of processors used by each job
#    main  ::name of the main program to be executed; could be
#            mcolspot | mcolspot_par | mintegrate | mintegrate_par
#  rhosts  ::names of CPU cluster nodes in the NFS network
#
# DLS UGE port of script to operate nicely with cluster
# scheduling system - will work with any XDS usage but is
# aimed for fast_dp see fast_dp#3. Options passed through environment:
#
# FORKXDS_PRIORITY - priority within queue, e.g. 1024
# FORKXDS_PROJECT - UGE project to assign for this
# FORKXDS_QUEUE - queue to submit to
ntask=$1  #total number of jobs
maxcpu=$2 #maximum number of processors used by each job
main=$3  #name of the main program to be executed
rm -f forkxds.params
itask=1
while test $itask -le $ntask
do
  echo $main >> forkxds.params
  itask=`expr $itask + 1`
done
# save environment
echo "PATH=$PATH" > forkxds.env
echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> forkxds.env
# check environment for queue; project; priority information
qsub_opt=""
if [[ -n "$FORKXDS_PRIORITY" ]] ; then
    qsub_opt="$qsub_command -p $FORKXDS_PRIORITY"
fi
if [[ -n "$FORKXDS_PROJECT" ]] ; then
    qsub_opt="$qsub_command -P $FORKXDS_PROJECT"
fi
if [[ -n "$FORKXDS_QUEUE" ]] ; then
    qsub_opt="$qsub_command -q $FORKXDS_QUEUE"
fi
qsub $qsub_opt -sync y -V -cwd -pe smp $maxcpu -t 1-$ntask `which forkxds_job`
</pre>
<pre>
# forkxds_job
#!/bin/bash
params=$(awk "NR==$SGE_TASK_ID" forkxds.params)
JOB=`echo $params | awk '{print $1}'`
# load environment
. forkxds.env
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export PATH=$PATH
echo $SGE_TASK_ID | $JOB
</pre>
== setup of XDS for a containerized system ==
In order to setup XDS for a container system, the ''forkxds'' script needs to be changed. Emilio Centeno <ecenteno@cells.es> created a "container-friendly" forkxds. This is the piece of code affected, with just one changed line:
<pre>
do
  if [ $nhosts -gt 1 ]        #distribute jobs among the cluster nodes
  then
      j=$(( (itask -1) % nhosts )) #changed from % nhosts + 1 to nhosts
    # Original line
      # echo "$itask" | ssh -x ${rhosts[$j]} "cd $PWD && $amain && sync" &
      # Image in file (.sif)
      # echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec /container_path/my_xds_container.sif $amain && sync" &
      # Image already loaded in an instance
      echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec instance://my_xds_instance $amain && sync" &
  else
      echo "$itask" | $amain && sync &  #submit all jobs to the peer node
  fi
  pids="$pids $!"            #append id of the new background process
  itask=`expr $itask + 1`
  # NOTE: sync after $amain complete pending disk writes on each node
done
</pre>
== Performance ==
Cluster nodes may have different numbers of processors.
Please note that the output line
number of OpenMP threads used  NN
in COLSPOT.LP and INTEGRATE.LP may be incorrect if MAXIMUM_NUMBER_OF_JOBS > 1, and the submitting node (the node that runs xds_par) has a different number of processors than the processing node(s) (the nodes that run mcolspot_par and mintegrate_par). The actual numbers of threads on the processing nodes may be obtained by
grep PARALLEL COLSPOT.LP
grep USING INTEGRATE.LP | uniq
The algorithm that determines the number of threads used on a processing node is:
NB = DELPHI / OSCILLATION_RANGE  # this may be slightly adjusted by XDS if DATA_RANGE / NB is not integer
NCORE = number of processors in the processing node, obtained by OMP_GET_NUM_PROCS()
if MAXIMUM_NUMBER_OF_PROCESSORS is not specified in XDS.INP then MAXIMUM_NUMBER_OF_PROCESSORS = NCORE
number_of_threads = MIN( NB, NCORE, MAXIMUM_NUMBER_OF_PROCESSORS, 99 )
This is implemented in BUILT=20191015 onwards.