2VB1: Difference between revisions
No edit summary |
No edit summary |
||
Line 3: | Line 3: | ||
== XDS processing == | == XDS processing == | ||
# use [[generate_XDS.INP]] to obtain a good starting point | |||
# edit [[XDS.INP]] and change the following: | |||
ORGX=3130 ORGY=3040 ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP! | ORGX=3130 ORGY=3040 ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP! | ||
TRUSTED_REGION=0 1.5 ! we want the whole detector area | TRUSTED_REGION=0 1.5 ! we want the whole detector area | ||
ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards! | ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards! | ||
# for faster processing on a machine with many cores, use (e.g. for 16 cores): | |||
MAXIMUM_NUMBER_OF_PROCESSORS=2 | MAXIMUM_NUMBER_OF_PROCESSORS=2 | ||
MAXIMUM_NUMBER_OF_JOBS=8 | MAXIMUM_NUMBER_OF_JOBS=8 | ||
Line 31: | Line 31: | ||
and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP . | and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP . | ||
=== Optimization === | |||
The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) [http://dx.doi.org/10.1107/S0907444910014836 Acta Cryst. D 66, 733-40]) as printed out by CORRECT. A higher ISa means better data. However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced. | |||
The following quantities may be tested for their influence on ISa: | |||
* copying GXPARM.XDS to XPARM.XDS | |||
* including the information from the first integration pass into XDS.INP - just do "grep _E INTEGRATE.LP|tail -2" and get e.g. | |||
BEAM_DIVERGENCE= 0.386 BEAM_DIVERGENCE_E.S.D.= 0.039 | |||
REFLECTING_RANGE= 0.669 REFLECTING_RANGE_E.S.D.= 0.096 | |||
copy these two lines into XDS.INP | |||
== Example: sweep e == | |||
=== [[XDS.INP]]; as generated by [[generate_XDS.INP]] === | |||
=== [[CORRECT.LP]] main table; 1st pass === | |||
=== [[XDS.INP]]; optimized === | |||
=== [[CORRECT.LP]] main table; optimization pass === | |||
=== CORRECT.LP main table; optimization pass === | |||
== XSCALE results == | == XSCALE results == | ||
a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from | a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from INTEGRATE.LP to XDS.INP | ||
=== main table === | === main table === | ||
Line 125: | Line 85: | ||
== Comparison of data processing: published (2006) ''vs'' XDS results == | |||
== Comparison of data processing: published ''vs'' XDS results == | |||
<table border = "1"> | <table border = "1"> | ||
Line 143: | Line 101: | ||
<tr><b> | <tr><b> | ||
<td> published </td> | <td> published(2006) </td> | ||
<td> 30-0.65Å (0.67-0.65Å) </td> | <td> 30-0.65Å (0.67-0.65Å) </td> | ||
<td> 1331953 (12764) </td> | <td> 1331953 (12764) </td> | ||
Line 165: | Line 123: | ||
</table> | </table> | ||
== timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS == | |||
The following is going to be rather technical! If you are only interested in crystallography, skip this. | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=2 | |||
MAXIMUM_NUMBER_OF_JOBS=8 | |||
we observe for the INTEGRATE step: | |||
total cpu time used 2063.6 sec | |||
total elapsed wall-clock time 296.1 sec | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=1 | |||
MAXIMUM_NUMBER_OF_JOBS=16 | |||
the times are | |||
total cpu time used 2077.1 sec | |||
total elapsed wall-clock time 408.2 sec | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=4 | |||
MAXIMUM_NUMBER_OF_JOBS=4 | |||
the times are | |||
total cpu time used 2102.8 sec | |||
total elapsed wall-clock time 315.6 sec | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine | |||
MAXIMUM_NUMBER_OF_JOBS=1 ! the default | |||
the times are | |||
total cpu time used 2833.4 sec | |||
total elapsed wall-clock time 566.5 sec | |||
but please note that this actually only uses 10 processors, since the default DELPHI=5 | |||
and the OSCILLATION_RANGE is 0.5°. | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=4 | |||
MAXIMUM_NUMBER_OF_JOBS=8 | |||
(thus overcommitting the available cores by a factor of 2) the times are | |||
total cpu time used 2263.5 sec | |||
total elapsed wall-clock time 320.8 sec | |||
Using | |||
MAXIMUM_NUMBER_OF_PROCESSORS=4 | |||
MAXIMUM_NUMBER_OF_JOBS=6 | |||
(thus overcommitting the available cores, but less severely) the times are | |||
total cpu time used 2367.6 sec | |||
total elapsed wall-clock time 267.2 sec | |||
Thus, | |||
MAXIMUM_NUMBER_OF_PROCESSORS=4 | |||
MAXIMUM_NUMBER_OF_JOBS=6 | |||
performs best for a 2-Xeon X5570 (HT enabled, thus 24 cores) machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches" before the XDS run). |
Revision as of 10:22, 22 February 2011
This reports processing of triclinic hen egg-white lysozyme data @ 0.65Å resolution (PDB id 2VB1). Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from here. Details of data collection, processing and refinement are published.
XDS processing
- use generate_XDS.INP to obtain a good starting point
- edit XDS.INP and change the following:
ORGX=3130 ORGY=3040 ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP! TRUSTED_REGION=0 1.5 ! we want the whole detector area ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
- for faster processing on a machine with many cores, use (e.g. for 16 cores):
MAXIMUM_NUMBER_OF_PROCESSORS=2 MAXIMUM_NUMBER_OF_JOBS=8
For all the sweeps, processing stopped with an error message after the IDXREF step. By inspecting IDXREF.LP, one should make sure that everything works as it should, i.e. that a large percentage of reflections was actually indexed nicely:
... 63879 OUT OF 72321 SPOTS INDEXED. ... ***** DIFFRACTION PARAMETERS USED AT START OF INTEGRATION ***** REFINED VALUES OF DIFFRACTION PARAMETERS DERIVED FROM 63879 INDEXED SPOTS REFINED PARAMETERS: DISTANCE BEAM AXIS CELL ORIENTATION STANDARD DEVIATION OF SPOT POSITION (PIXELS) 0.53 STANDARD DEVIATION OF SPINDLE POSITION (DEGREES) 0.12
It may be possible to adjust some parameters (for COLSPOT) so that the error message does not occur, but it is not worth the effort. So we just change
JOBS=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT
to
JOBS=DEFPIX INTEGRATE CORRECT
and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP .
Optimization
The main target of optimization is the asympototic (i.e. best) I/sigma (ISa) (Diederichs (2010) Acta Cryst. D 66, 733-40) as printed out by CORRECT. A higher ISa means better data. However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced. The following quantities may be tested for their influence on ISa:
- copying GXPARM.XDS to XPARM.XDS
- including the information from the first integration pass into XDS.INP - just do "grep _E INTEGRATE.LP|tail -2" and get e.g.
BEAM_DIVERGENCE= 0.386 BEAM_DIVERGENCE_E.S.D.= 0.039 REFLECTING_RANGE= 0.669 REFLECTING_RANGE_E.S.D.= 0.096
copy these two lines into XDS.INP
Example: sweep e
XDS.INP; as generated by generate_XDS.INP
CORRECT.LP main table; 1st pass
XDS.INP; optimized
CORRECT.LP main table; optimization pass
XSCALE results
a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from INTEGRATE.LP to XDS.INP
main table
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 2.91 15799 2114 2147 98.5% 2.3% 2.5% 15787 73.42 2.6% 1.1% -15% 0.705 1969 2.06 39607 3830 3856 99.3% 2.5% 2.8% 39602 81.49 2.6% 0.9% -11% 0.750 3794 1.68 64423 5068 5087 99.6% 3.1% 3.3% 64415 82.27 3.3% 1.0% -3% 0.843 5018 1.45 72869 6147 6163 99.7% 3.2% 3.5% 72867 77.43 3.4% 1.0% 0% 0.833 6055 1.30 71079 6652 6657 99.9% 3.3% 3.5% 71079 70.69 3.4% 1.1% 8% 0.865 6506 1.19 74584 7287 7298 99.8% 3.2% 3.4% 74575 66.78 3.4% 1.2% 5% 0.870 7060 1.10 84893 8268 8278 99.9% 3.5% 3.7% 84865 62.98 3.6% 1.3% 5% 0.858 7983 1.03 87893 8585 8603 99.8% 4.2% 4.4% 87859 56.04 4.4% 1.5% 4% 0.828 8238 0.97 92833 9457 9465 99.9% 5.2% 5.6% 92810 48.70 5.5% 1.7% 6% 0.802 9010 0.92 83981 9911 9927 99.8% 5.7% 6.3% 83954 41.48 6.0% 2.1% 5% 0.785 9362 0.88 74101 9620 9621 100.0% 6.3% 7.2% 74083 35.53 6.7% 2.6% 5% 0.785 9041 0.84 81383 11511 11518 99.9% 6.8% 7.7% 81361 30.26 7.3% 3.3% 1% 0.760 10616 0.81 67616 10240 10247 99.9% 7.1% 7.8% 67596 25.84 7.7% 4.2% 1% 0.782 9368 0.78 74077 11807 11817 99.9% 7.2% 7.3% 74049 22.26 7.8% 5.2% 1% 0.797 10697 0.75 86236 13831 13839 99.9% 8.5% 8.7% 86206 18.77 9.3% 6.7% 2% 0.809 12497 0.73 64601 10481 10488 99.9% 10.4% 10.5% 64573 15.77 11.3% 8.2% 2% 0.810 9375 0.71 71886 11727 11741 99.9% 12.8% 13.0% 71835 13.05 14.0% 10.6% 2% 0.800 10420 0.69 80233 13156 13163 99.9% 16.5% 16.9% 80130 10.32 18.1% 13.7% 1% 0.796 11661 0.67 84259 14746 14766 99.9% 22.0% 22.5% 84056 7.61 24.1% 19.6% 3% 0.789 12468 0.65 60775 15579 16551 94.1% 27.5% 30.3% 59893 4.49 31.7% 32.3% 1% 0.723 8936 total 1433128 190017 191232 99.4% 3.3% 3.5% 1431595 33.18 3.5% 3.5% 2% 0.801 170074
Comparison of data processing: published (2006) vs XDS results
resolution (highest resolution range) | observations | unique reflections | Multiplicity | Completeness (%) | R merge (%) | mean I/sigma | |
published(2006) | 30-0.65Å (0.67-0.65Å) | 1331953 (12764) | 187165 (6353) | 7.1 (2.7) | 97.6 (67.3) | 4.5 (18.4) | 36.2 (4.2) |
XDS | 30-0.65Å (0.67-0.65Å) | 1433128 (60775) | 190017 (15579) | 7.5 (3.9) | 99.4 (94.1) | 3.3 (27.5) | 33.2 (4.5) |
timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS
The following is going to be rather technical! If you are only interested in crystallography, skip this.
Using
MAXIMUM_NUMBER_OF_PROCESSORS=2 MAXIMUM_NUMBER_OF_JOBS=8
we observe for the INTEGRATE step:
total cpu time used 2063.6 sec total elapsed wall-clock time 296.1 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=1 MAXIMUM_NUMBER_OF_JOBS=16
the times are
total cpu time used 2077.1 sec total elapsed wall-clock time 408.2 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4 MAXIMUM_NUMBER_OF_JOBS=4
the times are
total cpu time used 2102.8 sec total elapsed wall-clock time 315.6 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine MAXIMUM_NUMBER_OF_JOBS=1 ! the default
the times are
total cpu time used 2833.4 sec total elapsed wall-clock time 566.5 sec
but please note that this actually only uses 10 processors, since the default DELPHI=5 and the OSCILLATION_RANGE is 0.5°.
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4 MAXIMUM_NUMBER_OF_JOBS=8
(thus overcommitting the available cores by a factor of 2) the times are
total cpu time used 2263.5 sec total elapsed wall-clock time 320.8 sec
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4 MAXIMUM_NUMBER_OF_JOBS=6
(thus overcommitting the available cores, but less severely) the times are
total cpu time used 2367.6 sec total elapsed wall-clock time 267.2 sec
Thus,
MAXIMUM_NUMBER_OF_PROCESSORS=4 MAXIMUM_NUMBER_OF_JOBS=6
performs best for a 2-Xeon X5570 (HT enabled, thus 24 cores) machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches" before the XDS run).