Cluster Installation: Difference between revisions

← Older edit

Cluster Installation (view source)

Revision as of 14:48, 29 November 2019

110 bytes removed , 29 November 2019

→‎Performance

Kay

Bureaucrats

2,669

edits

@@ Line 1: / Line 1: @@
-XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput.  Grid Engine was developed by Sun Microsystems'^TM^' (Sun Grid Engine, SGE) and later acquired by Oracle'^TM^' and subsequently acquired by UNIVATM. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x.  There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]
+The following does ''not'' refer to the  [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#CLUSTER_NODES= CLUSTER_NODES=] setup. The latter does ''not'' require a queueing system!
-Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes.  On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster.  Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.
+XDS can be run in a cluster using any batch job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. These are distributed resource management system which monitor the CPU and memory usage of the available computing resources and schedule jobs to the least used computers.
-'''Grid Engine Installation'''
+== setup of XDS for a batch queue system ==
-Redhas/CentOS Linux distribution comes with rpms for installing Grid Engine. One need to have administrative privileges to install. Install gridengine rpms on all the nodes using following command, Default shell for Grid Engine is /bin/csh
+In order to setup XDS for a queuing system, the ''forkxds'' script needs to be changed to use qsub instead of ssh. Example scripts used for Univa Grid Engine (UGA) at Diamond (from https://github.com/DiamondLightSource/fast_dp/tree/master/etc/uge_array - thanks to Graeme Winter!) are below; they may need to be changed for the specific environment and queueing system.
-[@
-root@sudhir:/home/spothineni 1> yum install gridengine gridengine-qmaster gridengine-execd  gridengine-qmon
-root@sudhir:/home/spothineni 2> rpm -qa | grep gridengine
+<pre>
+# forkxds
+#!/bin/bash
+#                    forkxds          Version DLS-2017/08
+#
+# enables  multi-tasking by splitting the COLSPOT and INTEGRATE
+# steps of xds into independent jobs. Each job is carried out by
+# a Fortran main program (mcolspot, mcolspot_par, mintegrate, or
+# mintegrate_par). The jobs are distributed among the processor
+# nodes of the NFS cluster network.
+#
+# 'forkxds' is called by xds or xds_par by the Fortran instruction
+# CALL SYSTEM('forkxds ntask maxcpu main rhosts'),
+#    ntask  ::total number of independent jobs (tasks)
+#   maxcpu  ::maximum number of processors used by each job
+#    main   ::name of the main program to be executed; could be
+#             mcolspot | mcolspot_par | mintegrate | mintegrate_par
+#   rhosts  ::names of CPU cluster nodes in the NFS network
+#
+# DLS UGE port of script to operate nicely with cluster
+# scheduling system - will work with any XDS usage but is
+# aimed for fast_dp see fast_dp#3. Options passed through environment:
+#
+# FORKXDS_PRIORITY - priority within queue, e.g. 1024
+# FORKXDS_PROJECT - UGE project to assign for this
+# FORKXDS_QUEUE - queue to submit to
-gridengine-qmaster-6.2u5-10.el6.4.x86_64
+ntask=$1  #total number of jobs
-gridengine-qmon-6.2u5-10.el6.4.x86_64
+maxcpu=$2 #maximum number of processors used by each job
-gridengine-execd-6.2u5-10.el6.4.x86_64
+main=$3   #name of the main program to be executed
-gridengine-6.2u5-10.el6.4.x86_64
-@]
-By default gridengine installation directory /usr/share/gridengine, contents shown below.
+rm -f forkxds.params
+itask=1
+while test $itask -le $ntask
+do
+   echo $main >> forkxds.params
+   itask=`expr $itask + 1`
+done
-[@
+# save environment
-root@sudhir:/home/spothineni 3> cd /usr/share/gridengine
+echo "PATH=$PATH" > forkxds.env
+echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> forkxds.env
-root@sudhir:/home/spothineni 4> ls
+# check environment for queue; project; priority information
-bin   default  hadoop    install_execd    lib  my_configuration.conf  qmon  utilbin
+qsub_opt=""
-ckpt  doc      inst_sge  install_qmaster  mpi  pvm                    util
+if [[ -n "$FORKXDS_PRIORITY" ]] ; then
-@]
+    qsub_opt="$qsub_command -p $FORKXDS_PRIORITY"
+fi
-On bl1upper which qmaster node install using install_qmaster
+if [[ -n "$FORKXDS_PROJECT" ]] ; then
+    qsub_opt="$qsub_command -P $FORKXDS_PROJECT"
+fi
-[@
+if [[ -n "$FORKXDS_QUEUE" ]] ; then
-root@bl1upper:/usr/share/gridengine 5>./install_qmaster
+    qsub_opt="$qsub_command -q $FORKXDS_QUEUE"
-@]
+fi
-Most of the answers are yes/no or typing enter.
+qsub $qsub_opt -sync y -V -cwd -pe smp $maxcpu -t 1-$ntask `which forkxds_job`
-Follwoing important environment variables are written to /usr/share/gridengine/default/common/settings.csh which should be in the $PATH.
+</pre>
-$SGE_ROOT=/usr/share/gridengine
+<pre>
-$SGE_QMASTER_PORT=6444
+# forkxds_job
-$SGE_EXECD_PORT=6445
-$SGE_CELL=default
-There is an option to give administrative email which is very useful, when ever there is any problem gridengine will send error messages to email.
+#!/bin/bash
-Ready with a file contains admin and submit hosts or you can manually enter all the hosts separated by space, use full DNS names of hosts. In this installation shadow host is not used. After the shadow host step make sure allhosts group and all.q are created otherwise installation sge_execd will have problems. Scheduler Tuning selected as 'Max', it has disadvantage, gridengine immediately schedules with out assuming the load, this will cause successive job submissions will go to same host until all the slots are filled for that machine. Selecting 'Normal' will assume the load but there is overhead of few sec. extra time for job scheduling.
-After finishing the installation the configuration files are automatically written to the directory /usr/share/gridengine/default since the cell name selected is 'default'. This directory can be choosen as a shared directory over NFS. Otherwise copy this directory to every host used int the cluster.
+params=$(awk "NR==$SGE_TASK_ID" forkxds.params)
+JOB=`echo $params | awk '{print $1}'`
-On execution node install execution daemon using following command
+# load environment
+. forkxds.env
-[@
+export LD_LIBRARY_PATH=$LD_LIBRARY_PATH
-root@bl1ws1:/usr/share/gridengine 5>./install_execd
+export PATH=$PATH
-@]
+echo $SGE_TASK_ID | $JOB
+</pre>
-the input is almost typing return if you already copied the 'default' directory to this node.
+== Performance ==
-'''Son of Gridengine'''
+Cluster nodes may have different numbers of processors.
+Please note that the output line
+ number of OpenMP threads used  NN
+in COLSPOT.LP and INTEGRATE.LP may be incorrect if MAXIMUM_NUMBER_OF_JOBS > 1, and the submitting node (the node that runs xds_par) has a different number of processors than the processing node(s) (the nodes that run mcolspot_par and mintegrate_par). The actual numbers of threads on the processing nodes may be obtained by
+ grep PARALLEL COLSPOT.LP
+ grep USING INTEGRATE.LP | uniq
-rpms available in this link
+The algorithm that determines the number of threads used on a processing node is:
+ NB = DELPHI / OSCILLATION_RANGE   # this may be slightly adjusted by XDS if DATA_RANGE / NB is not integer
-http://arc.liv.ac.uk/downloads/SGE/releases/8.1.8/
+ NCORE = number of processors in the processing node, obtained by OMP_GET_NUM_PROCS()
+ if MAXIMUM_NUMBER_OF_PROCESSORS is not specified in XDS.INP then MAXIMUM_NUMBER_OF_PROCESSORS = NCORE
-by defualt these rpms install in single directory /opt/sge instead of scattering (by default) files to /usr/bin, /usr/share/gridengine, /usr/spool/gridengine
+ number_of_threads = MIN( NB, NCORE, MAXIMUM_NUMBER_OF_PROCESSORS, 99 )
+This is implemented in BUILT=20191015 onwards.
-Default shell for Son of Gridengine is /bin/sh which is /bin/bash