Cluster Installation: Difference between revisions
No edit summary |
run xds in a container |
||
(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
The following does ''not'' refer to the [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system! | The following does ''not'' refer to the [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system! | ||
XDS can be run in a cluster using any | XDS can be run in a cluster using any batch job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers. | ||
== setup of XDS for a batch queue system == | == setup of XDS for a batch queue system == | ||
In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to | In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to use qsub instead of ssh. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed for the specific environment and queueing system. | ||
<pre> | <pre> | ||
Line 67: | Line 67: | ||
</pre> | </pre> | ||
<pre> | <pre> | ||
Line 83: | Line 84: | ||
echo $SGE_TASK_ID | $JOB | echo $SGE_TASK_ID | $JOB | ||
</pre> | </pre> | ||
== setup of XDS for a containerized system == | |||
In order to setup XDS for a container system, the ''forkxds'' script needs to be changed. Emilio Centeno <ecenteno@cells.es> created a "container-friendly" forkxds. This is the piece of code affected, with just one changed line: | |||
<pre> | |||
do | |||
if [ $nhosts -gt 1 ] #distribute jobs among the cluster nodes | |||
then | |||
j=$(( (itask -1) % nhosts )) #changed from % nhosts + 1 to nhosts | |||
# Original line | |||
# echo "$itask" | ssh -x ${rhosts[$j]} "cd $PWD && $amain && sync" & | |||
# Image in file (.sif) | |||
# echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec /container_path/my_xds_container.sif $amain && sync" & | |||
# Image already loaded in an instance | |||
echo "$itask" | ssh -o StrictHostKeyChecking=no -x ${rhosts[$j]} "cd $PWD && ml Apptainer && apptainer exec instance://my_xds_instance $amain && sync" & | |||
else | |||
echo "$itask" | $amain && sync & #submit all jobs to the peer node | |||
fi | |||
pids="$pids $!" #append id of the new background process | |||
itask=`expr $itask + 1` | |||
# NOTE: sync after $amain complete pending disk writes on each node | |||
done | |||
</pre> | |||
== Performance == | |||
Cluster nodes may have different numbers of processors. | |||
Please note that the output line | |||
number of OpenMP threads used NN | |||
in COLSPOT.LP and INTEGRATE.LP may be incorrect if MAXIMUM_NUMBER_OF_JOBS > 1, and the submitting node (the node that runs xds_par) has a different number of processors than the processing node(s) (the nodes that run mcolspot_par and mintegrate_par). The actual numbers of threads on the processing nodes may be obtained by | |||
grep PARALLEL COLSPOT.LP | |||
grep USING INTEGRATE.LP | uniq | |||
The algorithm that determines the number of threads used on a processing node is: | |||
NB = DELPHI / OSCILLATION_RANGE # this may be slightly adjusted by XDS if DATA_RANGE / NB is not integer | |||
NCORE = number of processors in the processing node, obtained by OMP_GET_NUM_PROCS() | |||
if MAXIMUM_NUMBER_OF_PROCESSORS is not specified in XDS.INP then MAXIMUM_NUMBER_OF_PROCESSORS = NCORE | |||
number_of_threads = MIN( NB, NCORE, MAXIMUM_NUMBER_OF_PROCESSORS, 99 ) | |||
This is implemented in BUILT=20191015 onwards. |