Cluster Installation: Difference between revisions

(6 intermediate revisions by the same user not shown)

Line 1:

The following does ''not'' refer to the [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system!

XDS can be run in a cluster using any ~~command line~~ job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.

XDS can be run in a cluster using any batch job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.

== setup of XDS for a batch queue system ==

In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to ~~access the environment and send jobs to different machines, and to switch off the~~ ssh~~-based mechanism which the CLUSTER_NODES keyword employs~~. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed ~~according to~~ the environment~~. Observe this uses the ''qsub'' command which submits forkxds_job to grid engine~~.

In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to use qsub instead of ssh. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed for the specific environment and queueing system.

<pre>

Line 67:

</pre>

<pre>

Line 83:

Line 84:

echo $SGE_TASK_ID | $JOB

</pre>

== setup of XDS for a containerized system ==

In order to setup XDS for a container system, the ''forkxds'' script needs to be changed. Emilio Centeno <ecenteno@cells.es> created a "container-friendly" forkxds. This is the piece of code affected, with just one changed line:

<pre>

do

if [ $nhosts -gt 1 ] #distribute jobs among the cluster nodes

then

j=$(( (itask -1) % nhosts )) #changed from % nhosts + 1 to nhosts

# Original line

# echo "$itask" | ssh -x ${rhosts[$j]} "cd $PWD && $amain && sync" &

# Image in file (.sif)

# echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec /container_path/my_xds_container.sif $amain && sync" &

# Image already loaded in an instance

echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec instance://my_xds_instance $amain && sync" &

else

echo "$itask" | $amain && sync & #submit all jobs to the peer node

fi

pids="$pids $!" #append id of the new background process

itask=`expr $itask + 1`

# NOTE: sync after $amain complete pending disk writes on each node

done

</pre>

== Performance ==

Cluster nodes may have different numbers of processors.

Please note that the output line

number of OpenMP threads used NN

in COLSPOT.LP and INTEGRATE.LP may be incorrect if MAXIMUM_NUMBER_OF_JOBS > 1, and the submitting node (the node that runs xds_par) has a different number of processors than the processing node(s) (the nodes that run mcolspot_par and mintegrate_par). The actual numbers of threads on the processing nodes may be obtained by

grep PARALLEL COLSPOT.LP

grep USING INTEGRATE.LP | uniq

The algorithm that determines the number of threads used on a processing node is:

NB = DELPHI / OSCILLATION_RANGE # this may be slightly adjusted by XDS if DATA_RANGE / NB is not integer

NCORE = number of processors in the processing node, obtained by OMP_GET_NUM_PROCS()

if MAXIMUM_NUMBER_OF_PROCESSORS is not specified in XDS.INP then MAXIMUM_NUMBER_OF_PROCESSORS = NCORE

number_of_threads = MIN( NB, NCORE, MAXIMUM_NUMBER_OF_PROCESSORS, 99 )

This is implemented in BUILT=20191015 onwards.

@@ Line 1: / Line 1: @@
 The following does ''not'' refer to the  [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system!
-XDS can be run in a cluster using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.
+XDS can be run in a cluster using any batch job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.
 == setup of XDS for a batch queue system ==
-In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to access the environment and send jobs to different machines, and to switch off the ssh-based mechanism which the CLUSTER_NODES keyword employs. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed according to the environment. Observe this uses the ''qsub'' command which submits forkxds_job to grid engine.
+In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to use qsub instead of ssh. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed for the specific environment and queueing system.
 <pre>
@@ Line 67: / Line 67: @@
 </pre>
 <pre>
@@ Line 83: / Line 84: @@
 echo $SGE_TASK_ID | $JOB
 </pre>
+== setup of XDS for a containerized system ==
+In order to setup XDS for a container system, the ''forkxds'' script needs to be changed. Emilio Centeno <ecenteno@cells.es> created a "container-friendly" forkxds. This is the piece of code affected, with just one changed line:
+<pre>
+do
+   if [ $nhosts -gt 1 ]        #distribute jobs among the cluster nodes
+   then
+      j=$(( (itask -1) % nhosts )) #changed from % nhosts + 1 to nhosts
+     # Original line
+      # echo "$itask" | ssh -x ${rhosts[$j]} "cd $PWD && $amain && sync" &
+      # Image in file (.sif)
+      # echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec /container_path/my_xds_container.sif $amain && sync" &
+      # Image already loaded in an instance
+      echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec instance://my_xds_instance $amain && sync" &
+   else
+      echo "$itask" | $amain && sync &  #submit all jobs to the peer node
+   fi
+   pids="$pids $!"             #append id of the new background process
+   itask=`expr $itask + 1`
+   # NOTE: sync after $amain complete pending disk writes on each node
+done
+</pre>
+== Performance ==
+Cluster nodes may have different numbers of processors.
+Please note that the output line
+ number of OpenMP threads used  NN
+in COLSPOT.LP and INTEGRATE.LP may be incorrect if MAXIMUM_NUMBER_OF_JOBS > 1, and the submitting node (the node that runs xds_par) has a different number of processors than the processing node(s) (the nodes that run mcolspot_par and mintegrate_par). The actual numbers of threads on the processing nodes may be obtained by
+ grep PARALLEL COLSPOT.LP
+ grep USING INTEGRATE.LP | uniq
+The algorithm that determines the number of threads used on a processing node is:
+ NB = DELPHI / OSCILLATION_RANGE   # this may be slightly adjusted by XDS if DATA_RANGE / NB is not integer
+ NCORE = number of processors in the processing node, obtained by OMP_GET_NUM_PROCS()
+ if MAXIMUM_NUMBER_OF_PROCESSORS is not specified in XDS.INP then MAXIMUM_NUMBER_OF_PROCESSORS = NCORE
+ number_of_threads = MIN( NB, NCORE, MAXIMUM_NUMBER_OF_PROCESSORS, 99 )
+This is implemented in BUILT=20191015 onwards.