2VB1: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
No edit summary
mNo edit summary
 
(11 intermediate revisions by 2 users not shown)
Line 6: Line 6:
# edit [[XDS.INP]] and change/add the following:
# edit [[XDS.INP]] and change/add the following:
  ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; these values from visual inspection
  ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; these values from visual inspection
  UNTRUSTED_RECTANGLE=1 3160 3000 3070  ! <xmin xmax ymin ymax> to mask shadow of beamstop; XDS-viewer to find out
! the following is for masking the beamstop shadow in sweeps c-d
  UNTRUSTED_RECTANGLE=0 3189 2960 3087 ! use XDS-viewer of ADXV to find the values
! the following is for sweeps e-h
UNTRUSTED_RECTANGLE=1 3160 3000 3070
  TRUSTED_REGION=0 1.5 ! we want the whole detector area
  TRUSTED_REGION=0 1.5 ! we want the whole detector area
  ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
  ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
  SILICON=34.812736 ! account for theta-dependant absorption in the CCD's phosphor. The correction is only  
  SILICON=34.812736 ! account for theta-dependant absorption in the CCD's phosphor. The correction is only  
  ! significant for hi-res data; 34.812736=32*(value for silicon as printed to CORRECT.LP if SILICON= not given)
  ! significant for hi-res data; 34.812736=32*(value for silicon as printed to CORRECT.LP if SILICON= not given)
  MAXIMUM_NUMBER_OF_PROCESSORS=4 ! for fast processing on a machine with many cores, use (e.g. for 16 cores)
  MAXIMUM_NUMBER_OF_PROCESSORS=4 ! for fast processing on a machine with many cores (e.g. for 16 cores)
  MAXIMUM_NUMBER_OF_JOBS=6 ! This "overcommits" the available cores but on the whole this produces results faster (see below).
  MAXIMUM_NUMBER_OF_JOBS=6 ! "overcommit" the available cores but on the whole this produces results faster
  SPACE_GROUP_NUMBER=1                  ! this is known
  SPACE_GROUP_NUMBER=1                  ! this is known
  UNIT_CELL_CONSTANTS=  27.07 31.25 33.76 87.98 108.00 112.11  ! from 2vb1
  UNIT_CELL_CONSTANTS=  27.07 31.25 33.76 87.98 108.00 112.11  ! from 2vb1
Line 187: Line 190:
  NUMBER OF UNIQUE ACCEPTED REFLECTIONS              171714
  NUMBER OF UNIQUE ACCEPTED REFLECTIONS              171714


=== further optimization ===
Another round of optimization again improves the R-factors and I/sigma at high resolution a bit, but it also increased the misfits back to 8200. At this point I decided to switch to FRIEDEL'S_LAW=FALSE, and the resulting table is:
      NOTE:      Friedel pairs are treated as different reflections.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  Rmrgd-F  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    1.77        9599    9023    19002      47.5%      1.5%      1.5%    1152  36.81    2.1%    1.6%    0%  0.000      0
    1.26      31196  28239    33446      84.4%      1.4%      1.6%    5914  34.40    2.0%    1.6%    0%  0.000      0
    1.03      40125  35205    43274      81.4%      1.7%      1.7%    9840  30.09    2.4%    2.0%    0%  0.000      0
    0.89      46987  40188    51124      78.6%      2.3%      2.3%    13598  22.03    3.2%    3.4%    0%  0.000      0
    0.80      52229  43723    57738      75.7%      3.9%      3.9%    17012  14.44    5.5%    6.6%    0%  0.000      0
    0.73      56830  46674    64088      72.8%      7.1%      6.8%    20312    9.30    10.1%    13.2%    0%  0.000      0
    0.68      60488  48814    69544      70.2%      13.9%    13.5%    23348    5.26    19.6%    27.1%    0%  0.000      0
    0.63      36190  28598    74736      38.3%      28.2%    29.7%    15184    2.70    39.8%    57.3%    0%  0.000      0
    0.60        9246    7246    79466        9.1%      57.8%    62.4%    4000    1.26    81.8%  122.0%    0%  0.000      0
    total      342890  287710    492418      58.4%      2.8%      2.8%  110360  16.19    3.9%    9.9%    0%  0.000      0
NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  345355
NUMBER OF REJECTED MISFITS                            2448
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                    342907
NUMBER OF UNIQUE ACCEPTED REFLECTIONS              287724
Indeed this brings the number of misfits to well below 1%, and it does make some sense.


== XSCALE results ==
== XSCALE results ==


a few sweeps were optimized by copying the two lines containing mosaicity and beam divergence values from INTEGRATE.LP to XDS.INP
The same strategy as shown for sweep e was used for sweeps a-d and f-h. XSCALE.INP is:
 
SPACE_GROUP_NUMBER=    1
UNIT_CELL_CONSTANTS= 27.07 31.25 33.76 87.98 108.00 112.11 !  from 2vb1 PDB entry
! cellparm for a-h gives  27.083    31.269    33.773    87.978  107.998  112.133


=== main table ===
OUTPUT_FILE=lys-xds.ahkl
FRIEDEL'S_LAW=TRUE
RESOLUTION_SHELLS=2.91 2.06 1.68 1.45 1.30 1.19 1.10 1.03 0.97 0.92 0.88 0.84 0.81 0.78 0.75 0.73 0.71 0.69 0.67 0.65
INPUT_FILE=../a/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../b/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../c/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../d/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../e/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../f/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../g/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../h/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65


=== XSCALE.LP tables ===
The error model is adjusted by XSCALE:
    a        b          ISa    ISa0  INPUT DATA SET
7.094E+00  1.294E-04  33.00  38.03 ../a/XDS_ASCII.HKL                               
7.476E+00  1.170E-04  33.81  38.95 ../b/XDS_ASCII.HKL                               
7.453E+00  1.598E-04  28.98  38.00 ../c/XDS_ASCII.HKL                               
6.539E+00  1.640E-04  30.54  39.08 ../d/XDS_ASCII.HKL                               
7.304E+00  1.342E-04  31.94  37.69 ../e/XDS_ASCII.HKL                               
8.201E+00  1.574E-04  27.83  35.58 ../f/XDS_ASCII.HKL                               
8.182E+00  1.759E-04  26.36  27.60 ../g/XDS_ASCII.HKL                               
7.717E+00  3.694E-04  18.73  21.93 ../h/XDS_ASCII.HKL                               
and there are about 1500 rejected reflections. It is reassuring to note that the error model works well; the ISa goes down toward sweep h probably because the crystal degrades. But see also the "a posterior remarks" below - sweep h is the one that is most affected by a shadow on the detector.


  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 199: Line 266:
   LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
   LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
   
   
     2.91      15799   2114     2147      98.5%      2.3%      2.5%    15787   73.42     2.6%    1.1%  -15%  0.705   1969
     2.91      16170   2112     2147      98.4%      2.2%      2.4%    16157   78.96     2.5%    1.1%  -12%  0.741   2023
     2.06      39607   3830     3856      99.3%      2.5%      2.8%    39602   81.49     2.6%    0.9%   -11%  0.750   3794
     2.06      40349   3831     3856      99.4%      2.4%      2.7%    40345   84.89     2.6%    0.9%   -9%  0.764   3803
     1.68      64423   5068      5087      99.6%      3.1%      3.3%    64415   82.27     3.3%    1.0%   -3%  0.843   5018
     1.68      65329   5068      5087      99.6%      3.1%      3.2%    65321   83.77     3.3%    1.0%     0%  0.847   5020
     1.45      72869   6147      6163      99.7%      3.2%      3.5%    72867   77.43     3.4%    1.0%    0%  0.833   6055
     1.45      73373   6147      6163      99.7%      3.2%      3.5%    73371   78.02     3.4%    1.0%    2%  0.842   6053
     1.30      71079   6652     6657      99.9%      3.3%      3.5%    71079   70.69     3.4%    1.1%    8%  0.865   6506
     1.30      71196   6651     6657      99.9%      3.2%      3.5%    71196   71.07     3.4%    1.1%    4%  0.857   6503
     1.19      74584   7287      7298      99.8%      3.2%      3.4%    74575   66.78     3.4%    1.2%    5%  0.870   7060
     1.19      74542   7287      7298      99.8%      3.2%      3.4%    74534   67.06     3.3%    1.2%    5%  0.854   7060
     1.10      84893   8268     8278      99.9%      3.5%      3.7%    84865   62.98     3.6%    1.3%    5%  0.858   7983
     1.10      84918   8269     8278      99.9%      3.4%      3.7%    84891   63.24     3.6%    1.3%    7%  0.853   7988
     1.03      87893   8585     8603      99.8%      4.2%      4.4%    87859   56.04     4.4%    1.5%    4%  0.828   8238
     1.03      87890   8584     8603      99.8%      4.1%      4.4%    87855   56.26     4.4%    1.5%    5%  0.818   8231
     0.97      92833   9457     9465      99.9%      5.2%      5.6%    92810   48.70     5.5%    1.7%    6%  0.802   9010
     0.97      92917   9460     9465      99.9%      5.2%      5.6%    92894   48.90     5.5%    1.7%    4%  0.795   9010
     0.92      83981   9911      9927      99.8%      5.7%      6.3%    83954   41.48     6.0%    2.1%    5%  0.785   9362
     0.92      83994   9911      9927      99.8%      5.7%      6.3%    83969   41.67     6.0%    2.0%    6%  0.787   9358
     0.88      74101   9620      9621      100.0%      6.3%      7.2%    74083   35.53     6.7%    2.6%    5%  0.785   9041
     0.88      74100   9620      9621      100.0%      6.3%      7.1%    74082   35.74     6.7%    2.5%    4%  0.772   9040
     0.84      81383   11511    11518      99.9%      6.8%      7.7%    81361   30.26     7.3%    3.3%    1%  0.760  10616
     0.84      81322   11511    11518      99.9%      6.9%      7.7%    81300   30.43     7.3%    3.3%    1%  0.760  10609
     0.81      67616   10240     10247      99.9%      7.1%      7.8%    67596   25.84     7.7%    4.2%    1%  0.782   9368
     0.81      67539   10239     10247      99.9%      7.1%      7.7%    67518   25.96     7.7%    4.2%    2%  0.779   9364
     0.78      74077   11807    11817      99.9%      7.2%      7.3%    74049   22.26     7.8%    5.2%    1%  0.797   10697
     0.78      73980   11807    11817      99.9%      7.1%      7.3%    73951   22.34     7.7%    5.3%    2%  0.799   10699
     0.75      86236   13831    13839      99.9%      8.5%      8.7%    86206   18.77    9.3%    6.7%    2%  0.809  12497
     0.75      86111   13831    13839      99.9%      8.4%      8.6%    86076   18.77    9.2%    6.8%    2%  0.809  12496
     0.73      64601   10481    10488      99.9%      10.4%    10.5%    64573   15.77   11.3%    8.2%    2%  0.810   9375
     0.73      64554   10481    10488      99.9%      10.3%    10.4%    64525   15.73   11.3%    8.2%    3%  0.815   9384
     0.71      71886   11727    11741      99.9%      12.8%    13.0%    71835   13.05   14.0%    10.6%    2%  0.800   10420
     0.71      71891   11727    11741      99.9%      12.8%    13.0%    71844   12.95   14.0%    10.6%    3%  0.810   10436
     0.69      80233   13156     13163       99.9%      16.5%    16.9%    80130   10.32   18.1%    13.7%    1%  0.796   11661
     0.69      80168   13157     13163     100.0%      16.6%    16.9%    80065   10.16   18.2%    14.1%    2%  0.799   11662
     0.67      84259   14746     14766      99.9%      22.0%    22.5%    84056   7.61   24.1%    19.6%    3%  0.789   12468
     0.67      84431   14747     14766      99.9%      22.2%    22.7%    84231   7.44   24.4%    19.7%    3%  0.798   12520
     0.65      60775   15579     16551      94.1%      27.5%    30.3%    59893   4.49   31.7%    32.3%    1%  0.723    8936
     0.65      61031   15592     16551      94.2%      27.6%    30.6%    60165   4.36   31.8%    33.1%    1%  0.723    9005
     total    1433128 190017   191232      99.4%      3.3%      3.51431595   33.18     3.5%    3.5%    2%  0.801  170074
     total    1435805 190032   191232      99.4%      3.1%      3.31434290   33.42     3.3%    3.1%    3%  0.801  170264
 
If two more resolution shells are added, they look like -
    0.64      23276    7411      9155      81.0%      35.0%    40.6%    22324    2.90    41.7%    47.9%    3%  0.683    3204
    0.63      18044    6488      9647      67.3%      42.2%    49.7%    16630    2.22    50.7%    60.9%    -5%  0.643    2437
So there is still useful signal beyond 0.65 A.
 
== Some ''a posteriori'' remarks ==
 
* For sweeps e-h one should use TRUSTED_REGION= 0 1.2 since that already gives 0.626 A in the corners.


* The first and last frames of sweeps g and h show a shadow in one corner of the detector. Nothing was done by me to exclude this shadow from processing (but one should do so at least if the resolution should be expanded beyond 0.65 A which the XSCALE statistics suggest to be possible). <br> One could experiment with MINIMUM_VALID_PIXEL_VALUE= 40 (or so) instead of 1 - I'd probably try that, but of course one does not want to exclude valid pixels so the result has to be carefully checked. <br> Anyway, there is no general facility in XDS to exclude bad areas of ''specific'' frames in a dataset; one needs to chop the dataset into parts and deal with each shadow separately.


== Comparison of data processing: published (2006) ''vs'' XDS results ==
== Comparison of data processing: published (2006) ''vs'' XDS results ==
Line 238: Line 315:


<tr><b>
<tr><b>
<td> published(2006) </td>
<td> published (2006) </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 1331953 (12764) </td>
<td> 1331953 (12764) </td>
Line 249: Line 326:


<tr><b>
<tr><b>
<td> XDS </td>
<td> XDS Version Dec 06, 2010 </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 30-0.65Å (0.67-0.65Å) </td>
<td> 1433128 (60775) </td>
<td> 1435805 (61031) </td>
<td> 190017 (15579) </td>
<td> 190032 (15592) </td>
<td> 7.5 (3.9) </td>
<td> 7.5 (3.9) </td>
<td> 99.4 (94.1) </td>
<td> 99.4 (94.2) </td>
<td> 3.3 (27.5) </td>
<td> 3.1 (27.6) </td>
<td> 33.2 (4.5) </td>
<td> 33.4 (4.4) </td>
</b></tr>
</b></tr>


</table>
</table>


== Availability of data from XDS processing ==
I changed XSCALE.INP to have
!FRIEDEL'S_LAW=TRUE  ! by commenting it out XSCALE will use FRIEDEL'S_LAW=FALSE
!                      since this is how the data were processed
RESOLUTION_SHELLS=2.91 2.06 1.68 1.45 1.30 1.19 1.10 1.03 0.97 0.92 0.88 0.84 0.80 0.76 0.73 0.70 0.67 0.65 0.64 0.63


== timings for processing sweep "e" as a function of MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS ==
and ran XSCALE again, to get a file with reflections to 0.63 A.  
 
The following is going to be rather technical! If you are only interested in crystallography, skip this.
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=2
MAXIMUM_NUMBER_OF_JOBS=8
we observe for the INTEGRATE step:
total cpu time used              2063.6 sec
total elapsed wall-clock time      296.1 sec
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=1
MAXIMUM_NUMBER_OF_JOBS=16
the times are
total cpu time used              2077.1 sec
total elapsed wall-clock time      408.2 sec
 
Using
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=4
the times are
total cpu time used              2102.8 sec
total elapsed wall-clock time      315.6 sec


Using
Conversion to other program systems is performed with XDSCONV. XDSCONV.INP for producing a MTZ file with intensities and anomalous signal is:
MAXIMUM_NUMBER_OF_PROCESSORS=16 ! the default for xds_par on a 16-core machine
  INPUT_FILE= lys-xds.ahkl
  MAXIMUM_NUMBER_OF_JOBS=1 ! the default
  OUTPUT_FILE=temp.hkl CCP4_I
the times are
total cpu time used              2833.4 sec
  total elapsed wall-clock time      566.5 sec
but please note that this actually only uses 10 processors, since the default DELPHI=5
and the OSCILLATION_RANGE is 0.5°.


Using
After running xdsconv, I cut-and-paste the screen output:
  MAXIMUM_NUMBER_OF_PROCESSORS=4
  f2mtz HKLOUT temp.mtz<F2MTZ.INP
  MAXIMUM_NUMBER_OF_JOBS=8
  cad HKLIN1 temp.mtz HKLOUT output_file_name.mtz<<EOF
(thus overcommitting the available cores by a factor of 2) the times are
LABIN FILE 1 ALL
  total cpu time used              2263.5 sec
  END
  total elapsed wall-clock time      320.8 sec
  EOF


Using
and obtain output_file_name.mtz which I mv to [https://{{SERVERNAME}}/pub/xds-datared/2vb1/xds-hewl-I.mtz xds-hewl-I.mtz]. SFCHECK statistics for this file are [https://{{SERVERNAME}}/pub/xds-datared/2vb1/sfcheck_XXXX.pdf here].
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
(thus overcommitting the available cores, but less severely) the times are
total cpu time used              2367.6 sec
total elapsed wall-clock time      267.2 sec


Thus,  
Similarly, using OUTPUT_FILE=temp.hkl CCP4 I obtained a file with amplitudes, [https://{{SERVERNAME}}/pub/xds-datared/2vb1/xds-hewl-F.mtz xds-hewl-F.mtz]
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
performs best for a 2-Xeon X5570 (HT enabled, thus 16 cores) machine with 24GB of memory and a RAID1 consisting of 2 1TB SATA disks. It should be noted that the dataset has 27GB, and in 296 seconds this means 92 MB/s continuous reading. The processing time is thus limited by the disk access, not by the CPU. And no, the data are not simply read from RAM (tested by "echo 3 > /proc/sys/vm/drop_caches" before the XDS run).

Latest revision as of 14:13, 24 March 2020

This reports processing of triclinic hen egg-white lysozyme data @ 0.65Å resolution (PDB id 2VB1). Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from here. Details of data collection, processing and refinement are published.

XDS processing

  1. use generate_XDS.INP to obtain a good starting point
  2. edit XDS.INP and change/add the following:
ORGX=3130 ORGY=3040  ! for ADSC, header values are subject to interpretation; these values from visual inspection
! the following is for masking the beamstop shadow in sweeps c-d
UNTRUSTED_RECTANGLE=0 3189 2960 3087 ! use XDS-viewer of ADXV to find the values 
! the following is for sweeps e-h
UNTRUSTED_RECTANGLE=1 3160 3000 3070
TRUSTED_REGION=0 1.5 ! we want the whole detector area
ROTATION_AXIS=-1 0 0 ! at this beamline the spindle goes backwards!
SILICON=34.812736 ! account for theta-dependant absorption in the CCD's phosphor. The correction is only 
! significant for hi-res data; 34.812736=32*(value for silicon as printed to CORRECT.LP if SILICON= not given)
MAXIMUM_NUMBER_OF_PROCESSORS=4 ! for fast processing on a machine with many cores (e.g. for 16 cores)
MAXIMUM_NUMBER_OF_JOBS=6 ! "overcommit" the available cores but on the whole this produces results faster
SPACE_GROUP_NUMBER=1                   ! this is known
UNIT_CELL_CONSTANTS=  27.07 31.25 33.76 87.98 108.00 112.11  ! from 2vb1
FRIEDEL'S_LAW=TRUE  ! we're not concerned with the anomalous signal

Then, run "xds_par". It completes after about 5 minutes on a fast machine, and we may inspect (at least) IDXREF.LP and CORRECT.LP (see below), and use "XDS-viewer FRAME.cbf" to get a visual impression of the integration as it applies to the last frame. By inspecting IDXREF.LP, one should make sure that everything works as it should, i.e. that a large percentage of reflections was actually indexed nicely, e.g.:

...
  63879 OUT OF   72321 SPOTS INDEXED.
...

***** DIFFRACTION PARAMETERS USED AT START OF INTEGRATION *****

REFINED VALUES OF DIFFRACTION PARAMETERS DERIVED FROM  63879 INDEXED SPOTS
REFINED PARAMETERS:   DISTANCE BEAM AXIS CELL ORIENTATION    
STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     0.53
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.12

Optimization

The main target of optimization is the asymptotic (i.e. best) I/sigma (ISa) (Diederichs (2010) Acta Cryst. D 66, 733-40) as printed out by CORRECT (and XSCALE). A higher ISa should mean better data.

However: ISa also rises if more reflections are thrown out as outliers ("misfits") so it is not considered to be optimization if just WFAC1 is reduced. Please note that the default WFAC1 is 1; this should result in the rejection of about 1% of observations. If you feel that 1% is too much then just increase WFAC1, to, say, 1.5 - that should result in rejection of less than (say) 0.1%. This will slightly increase completeness, but will reduce I/sigma and ISa, and increase R-factors.

The following quantities may be tested for their influence on ISa:

  • copying GXPARM.XDS to XPARM.XDS
  • including the information from the first integration pass into XDS.INP - just do "grep _E INTEGRATE.LP|tail -2" and get e.g.
BEAM_DIVERGENCE=   0.386  BEAM_DIVERGENCE_E.S.D.=   0.039
REFLECTING_RANGE=  0.669  REFLECTING_RANGE_E.S.D.=  0.096

copy these two lines into XDS.INP

  • prevent refinement in INTEGRATE: REFINE(INTEGRATE)= !

Example: sweep e

XDS.INP; as generated by generate_XDS.INP

generate_XDS.INP "../../APS/19-ID/2vb1/p1lyso_e.0???.img"

Then include the changes detailed above, resulting in:

JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT
MAXIMUM_NUMBER_OF_PROCESSORS=4
MAXIMUM_NUMBER_OF_JOBS=6
ORGX= 3130 ORGY= 3040  ! check these values with adxv !
UNTRUSTED_RECTANGLE=1 3160 3000 3070  ! <xmin xmax ymin ymax> to mask shadow of beamstop; XDS-viewer to find out
DETECTOR_DISTANCE= 99.9954
OSCILLATION_RANGE= 0.500
X-RAY_WAVELENGTH=   0.6525486
NAME_TEMPLATE_OF_DATA_FRAMES=../../APS/19-ID/2vb1/p1lyso_e.0???.img
! REFERENCE_DATA_SET=xxx/XDS_ASCII.HKL ! e.g. to ensure consistent indexing  
DATA_RANGE=1 360
SPOT_RANGE=1 180
! BACKGROUND_RANGE=1 10 ! rather use defaults (first 5 degree of rotation)

SPACE_GROUP_NUMBER=1                   ! 0 if unknown
UNIT_CELL_CONSTANTS= 27.07    31.25    33.76  87.98 108.00 112.11  ! PDB 2vb1
INCLUDE_RESOLUTION_RANGE=50 0  ! after CORRECT, insert high resol limit; re-run CORRECT


!FRIEDEL'S_LAW=FALSE     ! This acts only on the CORRECT step
! If the anom signal turns out to be, or is known to be, very low or absent,
! use FRIEDEL'S_LAW=TRUE instead (or comment out the line); re-run CORRECT

! remove the "!" in the following line:
! STRICT_ABSORPTION_CORRECTION=TRUE
! if the anomalous signal is strong: in that case, in CORRECT.LP the three
! "CHI^2-VALUE OF FIT OF CORRECTION FACTORS" values are significantly> 1, e.g. 1.5
!
! exclude (mask) untrusted areas of detector, e.g. beamstop shadow :
! UNTRUSTED_RECTANGLE= 1800 1950 2100 2150 ! x-min x-max y-min y-max ! repeat
! UNTRUSTED_ELLIPSE= 2034 2070 1850 2240 ! x-min x-max y-min y-max ! if needed
!
! parameters with changes wrt default values:
TRUSTED_REGION=0.00 1.5  ! partially use corners of detectors; 1.41421=full use
VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=7000. 30000. ! often 8000 is ok
MINIMUM_ZETA=0.05        ! integrate close to the Lorentz zone; 0.15 is default
STRONG_PIXEL=6           ! COLSPOT: only use strong reflections (default is 3)
MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=3 ! default of 6 is sometimes too high
REFINE(INTEGRATE)=CELL BEAM ORIENTATION ! AXIS DISTANCE 

! parameters specifically for this detector and beamline:
DETECTOR= ADSC MINIMUM_VALID_PIXEL_VALUE= 1 OVERLOAD= 65000
SENSOR_THICKNESS=0.01 SILICON=34.812736
NX= 6144 NY= 6144  QX= 0.051294  QY= 0.051294 ! to make CORRECT happy if frames are unavailable
DIRECTION_OF_DETECTOR_X-AXIS=1 0 0
DIRECTION_OF_DETECTOR_Y-AXIS=0 1 0
INCIDENT_BEAM_DIRECTION=0 0 1
ROTATION_AXIS=-1 0 0    ! at e.g. SERCAT ID-22 this needs to be -1 0 0
FRACTION_OF_POLARIZATION=0.98   ! better value is provided by beamline staff!
POLARIZATION_PLANE_NORMAL=0 1 0

CORRECT.LP 1st pass

STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     0.87
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.10
CRYSTAL MOSAICITY (DEGREES)     0.126

...

    a        b          ISa
6.630E+00  1.091E-04   37.18

...

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    1.77        9195    4841      9501       51.0%       1.5%      1.5%     8708   48.74     2.1%     1.6%     0%   0.000       0
    1.26       29991   15327     16721       91.7%       1.5%      1.6%    29328   45.26     2.1%     1.7%     0%   0.000       0
    1.03       38643   19731     21636       91.2%       1.7%      1.7%    37824   38.67     2.5%     2.1%     0%   0.000       0
    0.89       46156   23404     25561       91.6%       2.3%      2.4%    45504   27.56     3.3%     3.4%     0%   0.000       0
    0.80       51509   26034     28868       90.2%       4.0%      4.0%    50950   17.55     5.6%     7.0%     0%   0.000       0
    0.73       55989   28253     32034       88.2%       7.0%      6.8%    55472   10.98     9.8%    13.2%     0%   0.000       0
    0.68       59733   30115     34776       86.6%      13.1%     13.0%    59236    6.08    18.6%    26.0%     0%   0.000       0
    0.63       35385   18436     37367       49.3%      25.6%     26.9%    33898    2.99    36.3%    52.1%     0%   0.000       0
    0.60        8991    4972     39725       12.5%      51.2%     56.9%     8038    1.34    72.4%   105.0%     0%   0.000       0
   total      335592  171113    246189       69.5%       2.3%      2.4%   328958   19.58     3.3%     7.4%     0%   0.000       0


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  343716
NUMBER OF REJECTED MISFITS                            8112
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                     335604
NUMBER OF UNIQUE ACCEPTED REFLECTIONS               171119

The number of "misfits" (rejections) is higher than expected (1 %). Either one considers the anomalous signal (of the 6 sulfurs) to be significant, or one simply increases WFAC1 from its default of 1, to (say) 1.2 .

XDS.INP; optimized

Using the output of "grep _E INTEGRATE.LP|tail -2" edit XDS.INP to have

JOB= INTEGRATE CORRECT
BEAM_DIVERGENCE=   0.428  BEAM_DIVERGENCE_E.S.D.=   0.043
REFLECTING_RANGE=  0.880  REFLECTING_RANGE_E.S.D.=  0.126
... 
REFINE(INTEGRATE)= !

Then "cp GXPARM.XDS XPARM.XDS", and then another round of "xds_par". Five minutes later, we get:

CORRECT.LP optimization pass

This looks a little bit better - less standard deviation, higher ISa, better R-factors, less misfits:

STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     0.83
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.08
CRYSTAL MOSAICITY (DEGREES)     0.096

    a        b          ISa
6.439E+00  1.076E-04   37.98

...

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    1.77        9149    4817      9501       50.7%       1.5%      1.5%     8664   49.75     2.1%     1.5%     0%   0.000       0
    1.26       30049   15348     16723       91.8%       1.5%      1.6%    29402   46.26     2.1%     1.6%     0%   0.000       0
    1.03       38920   19863     21637       91.8%       1.7%      1.7%    38114   39.61     2.4%     2.0%     0%   0.000       0
    0.89       46381   23508     25562       92.0%       2.2%      2.3%    45746   28.39     3.1%     3.2%     0%   0.000       0
    0.80       51605   26071     28868       90.3%       3.8%      3.8%    51068   18.21     5.3%     6.5%     0%   0.000       0
    0.73       56126   28314     32041       88.4%       6.6%      6.4%    55624   11.45     9.3%    12.3%     0%   0.000       0
    0.68       59735   30093     34771       86.5%      12.6%     12.3%    59284    6.34    17.8%    24.8%     0%   0.000       0
    0.63       35754   18620     37370       49.8%      24.1%     25.5%    34268    3.11    34.1%    48.9%     0%   0.000       0
    0.60        9180    5075     39730       12.8%      48.6%     54.3%     8210    1.40    68.7%   100.5%     0%   0.000       0
   total      336899  171709    246203       69.7%       2.2%      2.3%   330380   20.14     3.2%     6.9%     0%   0.000       0


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  344751
NUMBER OF REJECTED MISFITS                            7842
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                     336909
NUMBER OF UNIQUE ACCEPTED REFLECTIONS               171714

further optimization

Another round of optimization again improves the R-factors and I/sigma at high resolution a bit, but it also increased the misfits back to 8200. At this point I decided to switch to FRIEDEL'S_LAW=FALSE, and the resulting table is:

      NOTE:      Friedel pairs are treated as different reflections.

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    1.77        9599    9023     19002       47.5%       1.5%      1.5%     1152   36.81     2.1%     1.6%     0%   0.000       0
    1.26       31196   28239     33446       84.4%       1.4%      1.6%     5914   34.40     2.0%     1.6%     0%   0.000       0
    1.03       40125   35205     43274       81.4%       1.7%      1.7%     9840   30.09     2.4%     2.0%     0%   0.000       0
    0.89       46987   40188     51124       78.6%       2.3%      2.3%    13598   22.03     3.2%     3.4%     0%   0.000       0
    0.80       52229   43723     57738       75.7%       3.9%      3.9%    17012   14.44     5.5%     6.6%     0%   0.000       0
    0.73       56830   46674     64088       72.8%       7.1%      6.8%    20312    9.30    10.1%    13.2%     0%   0.000       0
    0.68       60488   48814     69544       70.2%      13.9%     13.5%    23348    5.26    19.6%    27.1%     0%   0.000       0
    0.63       36190   28598     74736       38.3%      28.2%     29.7%    15184    2.70    39.8%    57.3%     0%   0.000       0
    0.60        9246    7246     79466        9.1%      57.8%     62.4%     4000    1.26    81.8%   122.0%     0%   0.000       0
   total      342890  287710    492418       58.4%       2.8%      2.8%   110360   16.19     3.9%     9.9%     0%   0.000       0


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  345355
NUMBER OF REJECTED MISFITS                            2448
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                     342907
NUMBER OF UNIQUE ACCEPTED REFLECTIONS               287724

Indeed this brings the number of misfits to well below 1%, and it does make some sense.

XSCALE results

The same strategy as shown for sweep e was used for sweeps a-d and f-h. XSCALE.INP is:

SPACE_GROUP_NUMBER=    1
UNIT_CELL_CONSTANTS= 27.07 31.25 33.76 87.98 108.00 112.11 !  from 2vb1 PDB entry
! cellparm for a-h gives  27.083    31.269    33.773    87.978   107.998   112.133
OUTPUT_FILE=lys-xds.ahkl
FRIEDEL'S_LAW=TRUE
RESOLUTION_SHELLS=2.91 2.06 1.68 1.45 1.30 1.19 1.10 1.03 0.97 0.92 0.88 0.84 0.81 0.78 0.75 0.73 0.71 0.69 0.67 0.65

INPUT_FILE=../a/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../b/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../c/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../d/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../e/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../f/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../g/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65
INPUT_FILE=../h/XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=30 0.65

XSCALE.LP tables

The error model is adjusted by XSCALE:

    a        b          ISa    ISa0   INPUT DATA SET
7.094E+00  1.294E-04   33.00   38.03 ../a/XDS_ASCII.HKL                                
7.476E+00  1.170E-04   33.81   38.95 ../b/XDS_ASCII.HKL                                
7.453E+00  1.598E-04   28.98   38.00 ../c/XDS_ASCII.HKL                                
6.539E+00  1.640E-04   30.54   39.08 ../d/XDS_ASCII.HKL                                
7.304E+00  1.342E-04   31.94   37.69 ../e/XDS_ASCII.HKL                                
8.201E+00  1.574E-04   27.83   35.58 ../f/XDS_ASCII.HKL                                
8.182E+00  1.759E-04   26.36   27.60 ../g/XDS_ASCII.HKL                                
7.717E+00  3.694E-04   18.73   21.93 ../h/XDS_ASCII.HKL                                

and there are about 1500 rejected reflections. It is reassuring to note that the error model works well; the ISa goes down toward sweep h probably because the crystal degrades. But see also the "a posterior remarks" below - sweep h is the one that is most affected by a shadow on the detector.

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    2.91       16170    2112      2147       98.4%       2.2%      2.4%    16157   78.96     2.5%     1.1%   -12%   0.741    2023
    2.06       40349    3831      3856       99.4%       2.4%      2.7%    40345   84.89     2.6%     0.9%    -9%   0.764    3803
    1.68       65329    5068      5087       99.6%       3.1%      3.2%    65321   83.77     3.3%     1.0%     0%   0.847    5020
    1.45       73373    6147      6163       99.7%       3.2%      3.5%    73371   78.02     3.4%     1.0%     2%   0.842    6053
    1.30       71196    6651      6657       99.9%       3.2%      3.5%    71196   71.07     3.4%     1.1%     4%   0.857    6503
    1.19       74542    7287      7298       99.8%       3.2%      3.4%    74534   67.06     3.3%     1.2%     5%   0.854    7060
    1.10       84918    8269      8278       99.9%       3.4%      3.7%    84891   63.24     3.6%     1.3%     7%   0.853    7988
    1.03       87890    8584      8603       99.8%       4.1%      4.4%    87855   56.26     4.4%     1.5%     5%   0.818    8231
    0.97       92917    9460      9465       99.9%       5.2%      5.6%    92894   48.90     5.5%     1.7%     4%   0.795    9010
    0.92       83994    9911      9927       99.8%       5.7%      6.3%    83969   41.67     6.0%     2.0%     6%   0.787    9358
    0.88       74100    9620      9621      100.0%       6.3%      7.1%    74082   35.74     6.7%     2.5%     4%   0.772    9040
    0.84       81322   11511     11518       99.9%       6.9%      7.7%    81300   30.43     7.3%     3.3%     1%   0.760   10609
    0.81       67539   10239     10247       99.9%       7.1%      7.7%    67518   25.96     7.7%     4.2%     2%   0.779    9364
    0.78       73980   11807     11817       99.9%       7.1%      7.3%    73951   22.34     7.7%     5.3%     2%   0.799   10699
    0.75       86111   13831     13839       99.9%       8.4%      8.6%    86076   18.77     9.2%     6.8%     2%   0.809   12496
    0.73       64554   10481     10488       99.9%      10.3%     10.4%    64525   15.73    11.3%     8.2%     3%   0.815    9384
    0.71       71891   11727     11741       99.9%      12.8%     13.0%    71844   12.95    14.0%    10.6%     3%   0.810   10436
    0.69       80168   13157     13163      100.0%      16.6%     16.9%    80065   10.16    18.2%    14.1%     2%   0.799   11662
    0.67       84431   14747     14766       99.9%      22.2%     22.7%    84231    7.44    24.4%    19.7%     3%   0.798   12520
    0.65       61031   15592     16551       94.2%      27.6%     30.6%    60165    4.36    31.8%    33.1%     1%   0.723    9005
   total     1435805  190032    191232       99.4%       3.1%      3.3%  1434290   33.42     3.3%     3.1%     3%   0.801  170264

If two more resolution shells are added, they look like -

    0.64       23276    7411      9155       81.0%      35.0%     40.6%    22324    2.90    41.7%    47.9%     3%   0.683    3204
    0.63       18044    6488      9647       67.3%      42.2%     49.7%    16630    2.22    50.7%    60.9%    -5%   0.643    2437

So there is still useful signal beyond 0.65 A.

Some a posteriori remarks

  • For sweeps e-h one should use TRUSTED_REGION= 0 1.2 since that already gives 0.626 A in the corners.
  • The first and last frames of sweeps g and h show a shadow in one corner of the detector. Nothing was done by me to exclude this shadow from processing (but one should do so at least if the resolution should be expanded beyond 0.65 A which the XSCALE statistics suggest to be possible).
    One could experiment with MINIMUM_VALID_PIXEL_VALUE= 40 (or so) instead of 1 - I'd probably try that, but of course one does not want to exclude valid pixels so the result has to be carefully checked.
    Anyway, there is no general facility in XDS to exclude bad areas of specific frames in a dataset; one needs to chop the dataset into parts and deal with each shadow separately.

Comparison of data processing: published (2006) vs XDS results

resolution (highest resolution range) observations unique reflections Multiplicity Completeness (%) R merge (%) mean I/sigma
published (2006) 30-0.65Å (0.67-0.65Å) 1331953 (12764) 187165 (6353) 7.1 (2.7) 97.6 (67.3) 4.5 (18.4) 36.2 (4.2)
XDS Version Dec 06, 2010 30-0.65Å (0.67-0.65Å) 1435805 (61031) 190032 (15592) 7.5 (3.9) 99.4 (94.2) 3.1 (27.6) 33.4 (4.4)

Availability of data from XDS processing

I changed XSCALE.INP to have

!FRIEDEL'S_LAW=TRUE  ! by commenting it out XSCALE will use FRIEDEL'S_LAW=FALSE
!                      since this is how the data were processed
RESOLUTION_SHELLS=2.91 2.06 1.68 1.45 1.30 1.19 1.10 1.03 0.97 0.92 0.88 0.84 0.80 0.76 0.73 0.70 0.67 0.65 0.64 0.63

and ran XSCALE again, to get a file with reflections to 0.63 A.

Conversion to other program systems is performed with XDSCONV. XDSCONV.INP for producing a MTZ file with intensities and anomalous signal is:

INPUT_FILE= lys-xds.ahkl
OUTPUT_FILE=temp.hkl CCP4_I

After running xdsconv, I cut-and-paste the screen output:

f2mtz HKLOUT temp.mtz<F2MTZ.INP
cad HKLIN1 temp.mtz HKLOUT output_file_name.mtz<<EOF
LABIN FILE 1 ALL
END
EOF

and obtain output_file_name.mtz which I mv to xds-hewl-I.mtz. SFCHECK statistics for this file are here.

Similarly, using OUTPUT_FILE=temp.hkl CCP4 I obtained a file with amplitudes, xds-hewl-F.mtz