2VB1: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
This reports processing of triclinic hen egg-white lysozyme data @ 0.65Å resolution (PDB id [[2VB1]]). Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from [http://bl831.als.lbl.gov/example_data_sets/APS/19-ID/2vb1/ here]. Details of data collection, processing and refinement are [http://journals.iucr.org/d/issues/2007/12/00/be5097/index.html published].
== XDS processing ==
== XDS processing ==


Line 80: Line 82:
  MAXIMUM_NUMBER_OF_JOBS=6
  MAXIMUM_NUMBER_OF_JOBS=6
performs best for a 2-Xeon X5570 machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches before the XDS run).
performs best for a 2-Xeon X5570 machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches before the XDS run).
== Comparison of XDS and published results of data processing ==
<table border = "1">
<tr>
<b>
<td>completeness overall (30-0.65Å (%)</td>
<td>data redundancy</td>
<td>R merge</td>
<td><I/sigma></td>
<td>completeness in highest resolution range (0.67-0.65Å)</td>
<td>data redundancy in highest resolution range (0.67-0.65Å)</td>
<td>R merge in highest resolution range</td>
<td><I/sigma> in highest resolution range</td>
</b></tr>
<tr>
<td>Hampton PEG/ion #1</td>
<td>0.2M NaF, 20% PEG3350, pH7.1</td>
<td>10mM Tris pH7.5, 150mM NaCl</td>
<td>yes</td>
<td>I've seen this condition produce salt crystals at least twice.--[[User:Pozharski|Ed]] 16:18, 23 May 2008 (CEST) <br> Ditto - NaF has caused me trouble many times - buyer beware! [[User:DaveB|DaveB]] 11:01, 24 May 2008 (CEST) </td>
<td>Hampton PEG/ion #1</td>
<td>0.2M NaF, 20% PEG3350, pH7.1</td>
<td>10mM Tris pH7.5, 150mM NaCl</td>
<td>yes</td>
<td>I've seen this condition produce salt crystals at least twice.--[[User:Pozharski|Ed]] 16:18, 23 May 2008 (CEST) <br> Ditto - NaF has caused me trouble many times - buyer beware! [[User:DaveB|DaveB]] 11:01, 24 May 2008 (CEST) </td>
</table>
REMARK 200  COMPLETENESS FOR RANGE    (%) : 97.6                             
REMARK 200  DATA REDUNDANCY                : 7.1                               
REMARK 200  R MERGE                    (I) : 0.04                             
REMARK 200  R SYM                      (I) : NULL                             
REMARK 200  <I/SIGMA(I)> FOR THE DATA SET  : 36.20                             
REMARK 200                                                                     
REMARK 200 IN THE HIGHEST RESOLUTION SHELL.                                   
REMARK 200  HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 0.65                   
REMARK 200  HIGHEST RESOLUTION SHELL, RANGE LOW  (A) : 0.67                   
REMARK 200  COMPLETENESS FOR SHELL    (%) : 67.3                             
REMARK 200  DATA REDUNDANCY IN SHELL      : 2.7                               
REMARK 200  R MERGE FOR SHELL          (I) : 0.18                             
REMARK 200  R SYM FOR SHELL            (I) : NULL                             
REMARK 200  <I/SIGMA(I)> FOR SHELL        : 4.20

Revision as of 18:29, 20 February 2011

This reports processing of triclinic hen egg-white lysozyme data @ 0.65Å resolution (PDB id 2VB1). Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from here. Details of data collection, processing and refinement are published.

XDS processing

ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; better inspect the table in IDXREF.LP!
TRUSTED_REGION=0 1.5 ! we want the whole detector area
ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
  • for faster processing on a machine with many cores, use (e.g. for 16 cores):
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8

For all the sweeps, processing stopped with an error message after the IDXREF step. By inspecting IDXREF.LP, one should make sure that everything works as it should, i.e. that a large percentage of reflections was actually indexed nicely:

...
  63879 OUT OF   72321 SPOTS INDEXED.
...

***** DIFFRACTION PARAMETERS USED AT START OF INTEGRATION *****

REFINED VALUES OF DIFFRACTION PARAMETERS DERIVED FROM  63879 INDEXED SPOTS
REFINED PARAMETERS:   DISTANCE BEAM AXIS CELL ORIENTATION    
STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     0.53
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.12

It may be possible to adjust some parameters (for COLSPOT) so that the error message does not occur, but it is not worth the effort. So we just change

JOBS=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT

to

JOBS=DEFPIX INTEGRATE CORRECT

and run "xds_par" again. It completes after about 5 minutes on a fast machine, and we may inspect CORRECT.LP .

timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS

The following is going to be rather technical! If you are only interested in crystallography, skip this.

Using

MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8

we observe for the INTEGRATE step:

total cpu time used               2063.6 sec
total elapsed wall-clock time      296.1 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=1
MAXIMUM_NUMBER_OF_JOBS=16

the times are

total cpu time used               2077.1 sec
total elapsed wall-clock time      408.2 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=4

the times are

total cpu time used               2102.8 sec
total elapsed wall-clock time      315.6 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine
MAXIMUM_NUMBER_OF_JOBS=1 ! the default

the times are

total cpu time used               2833.4 sec
total elapsed wall-clock time      566.5 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=8

(thus overcommitting the available cores by a factor of 2) the times are

total cpu time used               2263.5 sec
total elapsed wall-clock time      320.8 sec

Using

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6

(thus overcommitting the available cores, but less severely) the times are

total cpu time used               2367.6 sec
total elapsed wall-clock time      267.2 sec

Thus,

MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6

performs best for a 2-Xeon X5570 machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches before the XDS run).


Comparison of XDS and published results of data processing

completeness overall (30-0.65Å (%) data redundancy R merge completeness in highest resolution range (0.67-0.65Å) data redundancy in highest resolution range (0.67-0.65Å) R merge in highest resolution range in highest resolution range
Hampton PEG/ion #1 0.2M NaF, 20% PEG3350, pH7.1 10mM Tris pH7.5, 150mM NaCl yes I've seen this condition produce salt crystals at least twice.--Ed 16:18, 23 May 2008 (CEST)
Ditto - NaF has caused me trouble many times - buyer beware! DaveB 11:01, 24 May 2008 (CEST)
Hampton PEG/ion #1 0.2M NaF, 20% PEG3350, pH7.1 10mM Tris pH7.5, 150mM NaCl yes I've seen this condition produce salt crystals at least twice.--Ed 16:18, 23 May 2008 (CEST)
Ditto - NaF has caused me trouble many times - buyer beware! DaveB 11:01, 24 May 2008 (CEST)

REMARK 200 COMPLETENESS FOR RANGE (%) : 97.6 REMARK 200 DATA REDUNDANCY  : 7.1 REMARK 200 R MERGE (I) : 0.04 REMARK 200 R SYM (I) : NULL REMARK 200 FOR THE DATA SET  : 36.20 REMARK 200 REMARK 200 IN THE HIGHEST RESOLUTION SHELL. REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 0.65 REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : 0.67 REMARK 200 COMPLETENESS FOR SHELL (%) : 67.3 REMARK 200 DATA REDUNDANCY IN SHELL  : 2.7 REMARK 200 R MERGE FOR SHELL (I) : 0.18 REMARK 200 R SYM FOR SHELL (I) : NULL REMARK 200 FOR SHELL  : 4.20