2VB1: Difference between revisions

930 bytes added ,  22 February 2011
no edit summary
No edit summary
No edit summary
Line 3: Line 3:
== XDS processing ==
== XDS processing ==


* use [[generate_XDS.INP]] to obtain a good starting point
# use [[generate_XDS.INP]] to obtain a good starting point
* edit [[XDS.INP]] and change the following:
# edit [[XDS.INP]] and change the following:
  ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP!
  ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP!
  TRUSTED_REGION=0 1.5 ! we want the whole detector area
  TRUSTED_REGION=0 1.5 ! we want the whole detector area
  ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
  ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
* for faster processing on a machine with many cores, use (e.g. for 16 cores):
# for faster processing on a machine with many cores, use (e.g. for 16 cores):
  MAXIMUM_NUMBER_OF_PROCESSORS=2
  MAXIMUM_NUMBER_OF_PROCESSORS=2
  MAXIMUM_NUMBER_OF_JOBS=8
  MAXIMUM_NUMBER_OF_JOBS=8
Line 31: Line 31:
and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP .
and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP .


=== Optimization ===


== timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS ==
The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) [http://dx.doi.org/10.1107/S0907444910014836 Acta Cryst. D 66, 733-40]) as printed out by CORRECT. A higher ISa means better data. However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced.
The following quantities may be tested for their influence on ISa:
* copying GXPARM.XDS to XPARM.XDS
* including the information from the first integration pass into XDS.INP - just do "grep _E INTEGRATE.LP|tail -2" and get e.g.
BEAM_DIVERGENCE=  0.386  BEAM_DIVERGENCE_E.S.D.=  0.039
REFLECTING_RANGE= 0.669  REFLECTING_RANGE_E.S.D.= 0.096
copy these two lines into XDS.INP


The following is going to be rather technical! If you are only interested in crystallography, skip this.
== Example: sweep e ==
 
=== [[XDS.INP]]; as generated by [[generate_XDS.INP]] ===
Using
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8
we observe for the INTEGRATE step:
total cpu time used              2063.6 sec
total elapsed wall-clock time      296.1 sec
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=1
MAXIMUM_NUMBER_OF_JOBS=16
the times are
total cpu time used              2077.1 sec
total elapsed wall-clock time      408.2 sec
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=4
the times are
total cpu time used              2102.8 sec
total elapsed wall-clock time      315.6 sec
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine
MAXIMUM_NUMBER_OF_JOBS=1 ! the default
the times are
total cpu time used              2833.4 sec
total elapsed wall-clock time      566.5 sec
but please note that this actually only uses 10 processors, since the default DELPHI=5
and the OSCILLATION_RANGE is 0.5°.
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=8
(thus overcommitting the available cores by a factor of 2) the times are
total cpu time used              2263.5 sec
total elapsed wall-clock time      320.8 sec


Using
=== [[CORRECT.LP]] main table; 1st pass ===
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
(thus overcommitting the available cores, but less severely) the times are
total cpu time used              2367.6 sec
total elapsed wall-clock time      267.2 sec


Thus,
=== [[XDS.INP]]; optimized ===
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
performs best for a 2-Xeon X5570 machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches before the XDS run).


== Example: sweep e ==
=== [[CORRECT.LP]] main table; optimization pass ===
=== XDS.INP ===
=== CORRECT.LP main table; 1st pass ===
=== CORRECT.LP main table; optimization pass ===




== XSCALE results ==
== XSCALE results ==


a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from
a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from INTEGRATE.LP to XDS.INP


=== main table ===
=== main table ===
Line 125: Line 85:




 
== Comparison of data processing: published (2006) ''vs'' XDS results ==
 
== Comparison of data processing: published ''vs'' XDS results ==


<table border = "1">
<table border = "1">
Line 143: Line 101:


<tr><b>
<tr><b>
<td> published </td>
<td> published(2006) </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 1331953 (12764) </td>
<td> 1331953 (12764) </td>
Line 165: Line 123:


</table>
</table>
== timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS ==
The following is going to be rather technical! If you are only interested in crystallography, skip this.
Using
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8
we observe for the INTEGRATE step:
total cpu time used              2063.6 sec
total elapsed wall-clock time      296.1 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=1
MAXIMUM_NUMBER_OF_JOBS=16
the times are
total cpu time used              2077.1 sec
total elapsed wall-clock time      408.2 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=4
the times are
total cpu time used              2102.8 sec
total elapsed wall-clock time      315.6 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine
MAXIMUM_NUMBER_OF_JOBS=1 ! the default
the times are
total cpu time used              2833.4 sec
total elapsed wall-clock time      566.5 sec
but please note that this actually only uses 10 processors, since the default DELPHI=5
and the OSCILLATION_RANGE is 0.5°.
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=8
(thus overcommitting the available cores by a factor of 2) the times are
total cpu time used              2263.5 sec
total elapsed wall-clock time      320.8 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
(thus overcommitting the available cores, but less severely) the times are
total cpu time used              2367.6 sec
total elapsed wall-clock time      267.2 sec
Thus,
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
performs best for a 2-Xeon X5570 (HT enabled, thus 24 cores) machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches" before the XDS run).
2,684

edits