Performance: Difference between revisions

Line 182: Line 182:
XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX (XX = 01..99) stands for the "JOB" and YY (YY = 01..99) for the thread number that produces the temporary file.  
XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX (XX = 01..99) stands for the "JOB" and YY (YY = 01..99) for the thread number that produces the temporary file.  


Decompression is time-consuming in terms of CPU and I/O, but saves a lot of disk space. The penalty associated with decompression can be mitigated by  
Compression saves a lot of disk space, but decompression is time-consuming in terms of CPU and I/O. The penalty associated with decompression can be mitigated by  
* (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is
* (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is
<pre>
<pre>
#!/bin/bash
#!/bin/bash
# purpose: create symlinks for xds_par
# usage: mklinks [# of jobs]


test -z $1 && echo ERROR - first parameter \(MAXPROCS\) missing && exit 1
maxjobs=$1
test -z $2 && echo ERROR - second parameter \(MAXJOBS\) missing && exit 1
test -z $1 && maxjobs=1
maxprocs=$1
 
maxjobs=$2
maxprocs=$(grep processor /proc/cpuinfo | wc -l)
echo creating symlinks for $maxprocs threads and $maxjobs JOBs


# create unique directory for SCRATCH2 files:
# create unique directory for SCRATCH2 files:
tempdir="/dev/shm/xds${PWD//\//_}"
tempdir="/dev/shm/xds${PWD//\//_}"
[ -d $tempdir ] && rm -rf $tempdir
rm -rf $tempdir
mkdir $tempdir
mkdir $tempdir


for j in $(seq 1 $maxjobs); do
for j in $(seq 1 $maxjobs); do
   for i in $(seq 1 $maxprocs); do
   for i in $(seq 1 $maxprocs); do
    ln -sfn $tempdir/SCRATCH_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp
     ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp
     ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp
   done
   done
done
done
</pre>
</pre>
This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one should cleanup with this script (typically called <code>rmlinks</code>):  
This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one may cleanup with this script (typically called <code>rmlinks</code>):  
<pre>
<pre>
#!/bin/bash
#!/bin/bash
tempdir="/dev/shm/xds${PWD//\//_}"
tempdir="/dev/shm/xds${PWD//\//_}"
rm -rf $tempdir
rm -rf $tempdir
rm -f SCRATCH2*
rm -f SCRATCH*
</pre>
</pre>
* if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH):  
* if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH):  
  ln -s `which lbunzip2` $HOME/bin/bunzip2
  ln -s `which lbunzip2` $HOME/bin/bunzip2
If both measures are combined, the overhead of processing <code>.bz2</code> files is reduced to an insignificant level.
Both measures can be combined.


== Linux kernel setting ==
== Linux kernel setting ==
  cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
  cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed).
should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed).
2,684

edits