2,684
edits
Line 182: | Line 182: | ||
XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX (XX = 01..99) stands for the "JOB" and YY (YY = 01..99) for the thread number that produces the temporary file. | XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX (XX = 01..99) stands for the "JOB" and YY (YY = 01..99) for the thread number that produces the temporary file. | ||
Compression saves a lot of disk space, but decompression is time-consuming in terms of CPU and I/O. The penalty associated with decompression can be mitigated by | |||
* (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is | * (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is | ||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
# purpose: create symlinks for xds_par | |||
# usage: mklinks [# of jobs] | |||
test -z $1 && | maxjobs=$1 | ||
test -z $1 && maxjobs=1 | |||
maxprocs | |||
maxjobs | maxprocs=$(grep processor /proc/cpuinfo | wc -l) | ||
echo creating symlinks for $maxprocs threads and $maxjobs JOBs | |||
# create unique directory for SCRATCH2 files: | # create unique directory for SCRATCH2 files: | ||
tempdir="/dev/shm/xds${PWD//\//_}" | tempdir="/dev/shm/xds${PWD//\//_}" | ||
rm -rf $tempdir | |||
mkdir $tempdir | mkdir $tempdir | ||
for j in $(seq 1 $maxjobs); do | for j in $(seq 1 $maxjobs); do | ||
for i in $(seq 1 $maxprocs); do | for i in $(seq 1 $maxprocs); do | ||
ln -sfn $tempdir/SCRATCH_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp | |||
ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp | ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp | ||
done | done | ||
done | done | ||
</pre> | </pre> | ||
This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one | This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one may cleanup with this script (typically called <code>rmlinks</code>): | ||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
tempdir="/dev/shm/xds${PWD//\//_}" | tempdir="/dev/shm/xds${PWD//\//_}" | ||
rm -rf $tempdir | rm -rf $tempdir | ||
rm -f | rm -f SCRATCH* | ||
</pre> | </pre> | ||
* if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH): | * if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH): | ||
ln -s `which lbunzip2` $HOME/bin/bunzip2 | ln -s `which lbunzip2` $HOME/bin/bunzip2 | ||
Both measures can be combined. | |||
== Linux kernel setting == | == Linux kernel setting == | ||
cat /sys/kernel/mm/redhat_transparent_hugepage/enabled | cat /sys/kernel/mm/redhat_transparent_hugepage/enabled | ||
should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed). | should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed). |