Performance: Difference between revisions

Performance (view source)

Revision as of 17:04, 12 January 2017

1,919 bytes added , 12 January 2017

→‎Linux kernel setting

Kay

Bureaucrats

2,716

edits

@@ Line 177: / Line 177: @@
 In that case, I'd suggest to modify e.g. <tt>forkcolspot_cluster</tt> to not run <tt>mcolspot_par</tt> directly on the remote machine, but rather to run a script on that machine that checks the number of nodes, and runs <tt>mcolspot_par</tt> on the right node.
+== processing compressed data ==
+XDS can process data files that were previously compressed with compress (<code>.Z</code>), gzip (<code>.gz</code>), bzip2 (<code>.bz2</code>) or xz (<code>.xz</code>). It does this by on-the-fly decompression to temporary files with standard names (<code>SCRATCH2XXYY.tmp</code>) where XX stands for the "JOB" and YY for the thread number that produces the temporary file.
+Decompression is time-consuming in terms of CPU and I/O, but saves a lot of disk space. The penalty associated with decompression can be mitigated by
+* (Linux only) using symlinks pointing to /dev/shm which results in <code>SCRATCH2XXYY.tmp</code> being written to RAM instead of (network) disk. A script (typically called <code>mklinks</code>) achieving this is
+<pre>
+#!/bin/bash
+test -z $1 && echo ERROR - first parameter \(MAXPROCS\) missing && exit 1
+test -z $2 && echo ERROR - second parameter \(MAXJOBS\) missing && exit 1
+maxprocs=$1
+maxjobs=$2
+# create unique directory for SCRATCH2 files:
+tempdir="/dev/shm/xds${PWD//\//_}"
+[ -d $tempdir ] && rm -rf $tempdir
+mkdir $tempdir
+for j in $(seq 1 $maxjobs); do
+  for i in $(seq 1 $maxprocs); do
+    ln -sfn $tempdir/SCRATCH2_$(printf "%02d" "$j")$(printf "%02d" "$i").tmp
+  done
+done
+</pre>
+This has to be run in the XDS processing directory of the current dataset, before running <code>xds_par</code>. After finishing data processing, one should cleanup with this script (typically called <code>rmlinks</code>):
+<pre>
+#!/bin/bash
+tempdir="/dev/shm/xds${PWD//\//_}"
+rm -rf $tempdir
+rm -f SCRATCH2*
+</pre>
+* if decompressing <code>.bz2</code> files, one can use the faster <code>lbunzip2</code> (if it is installed) simply by making a symlink to it (assuming $HOME/bin is in your $PATH):
+ ln -s `which lbunzip2` $HOME/bin/bunzip2
+If both measures are combined, the overhead of processing <code>.bz2</code> files is reduced to an insignificant level.
 == Linux kernel setting ==
   cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
 should show <code>always</code>, not <code>never</code> to be active (the active setting is bracketed).

Performance: Difference between revisions

Performance (view source)

Revision as of 17:04, 12 January 2017

Navigation menu

Search