Cluster Installation: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 1: Line 1:
XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput.  Grid Engine was developed by Sun Microsystems'^TM^' (Sun Grid Engine, SGE) and later acquired by Oracle'^TM^' and subsequently acquired by UNIVATM. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x.  There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]
XDS can be run in cluster mode using any command line job scheduling software such as Grid Engine, Condor, Torque/PBS, LSF, SLURM etc. We implemented Grid Engine. It is a distributed resource management system which monitors the CPU and memory usage of the available computing resources and schedules the job to the least used computer. Grid Engine was chosen due to its high scalability, cost effectiveness, ease of maintenance and high throughput.  Grid Engine was developed by Sun Microsystems (Sun Grid Engine, SGE) and later acquired by Oracle and subsequently acquired by UNIVA. The latest versions became closed source, but the older ones are open source supplied with many Linux distributions including Redhat/CentOS 6.x.  There is also open source Open Grid Scheduler [[http://gridscheduler.sourceforge.net/]], Son of Gridengine [[https://arc.liv.ac.uk/trac/SGE ]]
 
Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes.  On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster. Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.
 
 
XDS Cluster setup
 
In order to setup XDS in cluster mode, forkcolspot and forkintegrate scripts need to be changed to access the gridengine environment and send jobs to different machines. Example scripts are below, need to be changed according to the environment.
 
<code>
#forkcolspot
 
ntask=$1  #total number of jobs
maxcpu=$2 #maximum number of processors used by each job
  #maxcpu=1: use 'mcolspot' (single processor)
  #maxcpu>1: use 'mcolspot_par' (openmp version)
 
pids=""                    #list of background process ID's
itask=1
echo "MAX CPU $maxcpu $image1"
 
#Sudhir check for gridengine submit host
submitnodes=`qconf -sh 2> /dev/null`
thishost=`hostname`
isgrid=0
for node in $submitnodes ; do
if [ "$node" == "$thishost" ]
then
isgrid=1
echo "Grid Engine environment detected"
fi
done
 
while test $itask -le $ntask
do
  if [ $maxcpu -gt 1 ]
#    then echo "$itask" | mcolspot_par &
#    else echo "$itask" | mcolspot    &
      then
      if [ $isgrid -eq 1 ]
then
qsub -sync y -V -l h_rt=0:20:00 -cwd \
  forkcolspot_job \
  $itask  &
      #else echo "$itask" | qrsh -V -cwd "mcolspot"    &
else echo "$itask" | mcolspot_par &
fi
  else echo "$itask" | mcolspot    &
  fi
  pids="$pids $!"  #append id of the background process just started
 
  itask=`expr $itask + 1`
done
trap "kill -15 $pids" 2 15  # 2:Control-C; 15:kill
wait  #wait for all background processes issued by this shell
rm -f mcolspot.tmp  #this temporary file was generated by xds
rm -rf fork*job*
</code>
 
----
 
<code>
#forkcolspot_job
 
#!/bin/csh
 
echo $1
set itask=$1
 
echo $itask | mcolspot_par
</code>
 
----
 
 
<code>
#forkintegate
 
fframe=$1 #id number of the first image
ni=$2    #number of images in the data set
ntask=$3  #total number of jobs
niba0=$4  #minimum number of images in a batch
maxcpu=$5 #maximum number of processors used by each job
          #maxcpu=1: use 'mintegrate' (single processor)
          #maxcpu>1: use 'mintegrate_par' (openmp version)
 
minitask=$(($ni / $ntask)) #minimum number of images in a job
mtask=$(($ni % $ntask))    #number of jobs with minitask+1 images
pids=""                    #list of background process ID's
nba=0
litask=0
itask=1
 
#Sudhir check for gridengine submit host
submitnodes=`qconf -sh 2> /dev/null`
thishost=`hostname`
isgrid=0
for node in $submitnodes ; do
if [ "$node" == "$thishost" ]
then
isgrid=1
echo "Grid Engine environment detected"
fi
done
 
while test $itask -le $ntask
do
  if [ $itask -gt $mtask ]
      then nitask=$minitask
      else nitask=$(($minitask + 1))
  fi
  fitask=`expr $litask + 1`
  litask=`expr $litask + $nitask`
  if [ $nitask -lt $niba0 ]
      then n=$nitask
      else n=$niba0
  fi
  if [ $n -lt 1 ]
      then n=1
  fi
  nbatask=$(($nitask / $n))
  nba=`expr $nba + $nbatask`
  image1=$(($fframe + $fitask - 1)) #id number of the first image
 
  if [ $maxcpu -gt 1 ]
      then
      if [ $isgrid -eq 1 ]
then
      qsub -sync y -V -l h_rt=0:20:00 -cwd \
  forkintegrate_job \
  $image1 $nitask $itask $nbatask &
      #else echo "$image1 $nitask $itask $nbatask" | qrsh -V -cwd "mintegrate"    &
      else echo "$image1 $nitask $itask $nbatask" | mintegrate_par  &
      fi
      else echo "$image1 $nitask $itask $nbatask" | mintegrate  &
  fi
  pids="$pids $!"  #append id of the background process just started
 
  itask=`expr $itask + 1`
done
trap "kill -15 $pids" 2 15  # 2:Control-C; 15:kill
wait  #wait for all background processes issued by this shell
rm -f mintegrate.tmp  #this temporary file was generated by mintegrate
rm -rf fork*job*
</code>
 
<code>
#forkintegrate_job
 
#!/bin/bash
 
set image1=$1
set nitask=$2
set itask=$3
set nbatask=$4
 
set host=`uname -a | awk '{print $2}'`
echo $image1 $nitask $itask $nbatask $host >> jobs.log
echo $image1 $nitask $itask $nbatask | mintegrate_par
</code>


Grid Engine consists of a master node daemon named sgemaster which schedules jobs to execution nodes.  On each execution node a daemon named sge_execd runs a job and sends a completion signal back to sgemaster.  Jobs are submitted to sgemaster using command such as qsub or using DRMAA C, JAVA or IDL bindings from any applications want to run XDS.


'''Grid Engine Installation'''
'''Grid Engine Installation'''
38

edits