2,684
edits
No edit summary |
|||
Line 177: | Line 177: | ||
In that case, I'd suggest to modify e.g. <tt>forkcolspot_cluster</tt> to not run <tt>mcolspot_par</tt> directly on the remote machine, but rather to run a script on that machine that checks the number of nodes, and runs <tt>mcolspot_par</tt> on the right node. | In that case, I'd suggest to modify e.g. <tt>forkcolspot_cluster</tt> to not run <tt>mcolspot_par</tt> directly on the remote machine, but rather to run a script on that machine that checks the number of nodes, and runs <tt>mcolspot_par</tt> on the right node. | ||
== processing compressed data == | |||
XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX stands for the "JOB" and YY for the thread number that produces the temporary file. | |||
Decompression is time-consuming in terms of CPU and I/O, but saves a lot of disk space. The penalty associated with decompression can be mitigated by | |||
* (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is | |||
<pre> | |||
#!/bin/bash | |||
test -z $1 && echo ERROR - first parameter \(MAXPROCS\) missing && exit 1 | |||
test -z $2 && echo ERROR - second parameter \(MAXJOBS\) missing && exit 1 | |||
maxprocs=$1 | |||
maxjobs=$2 | |||
# create unique directory for SCRATCH2 files: | |||
tempdir="/dev/shm/xds${PWD//\//_}" | |||
[ -d $tempdir ] && rm -rf $tempdir | |||
mkdir $tempdir | |||
for j in $(seq 1 $maxjobs); do | |||
for i in $(seq 1 $maxprocs); do | |||
ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp | |||
done | |||
done | |||
</pre> | |||
This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one should cleanup with this script (typically called <code>rmlinks</code>): | |||
<pre> | |||
#!/bin/bash | |||
tempdir="/dev/shm/xds${PWD//\//_}" | |||
rm -rf $tempdir | |||
rm -f SCRATCH2* | |||
</pre> | |||
* if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH): | |||
ln -s `which lbunzip2` $HOME/bin/bunzip2 | |||
If both measures are combined, the overhead of processing <code>.bz2</code> files is reduced to an insignificant level. | |||
== Linux kernel setting == | == Linux kernel setting == | ||
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled | cat /sys/kernel/mm/redhat_transparent_hugepage/enabled | ||
should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed). | should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed). |