The structure is deposited in the PDB, solved with SAD and refined at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96). The data for this project were provided by Jürgen Bosch (SGPP) and are linked to the ACA 2011 workshop website. There are two high-resolution (2 Å) datasets E1 (wavelength 0.9794Å) and E2 (@ 0.9174Å) collected (with 0.25° increments) at an ALS beamline on June 27, 2004, and a weaker dataset collected earlier at a SSRL beamline. We will only use the former two datasets here.

Dataset E1

Use generate_XDS.INP and run xds once. Based on R-factors in the resulting CORRECT.LP, and an inspection of BKGPIX.cbf, I modified XDS.INP to have

INCLUDE_RESOLUTION_RANGE=40 2.1                       ! too weak beyond 2.1 Å
VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=8000. 30000.  ! raised from 7000 30000 to mask beamstop

and ran xds again.

What's the problem?

This is the excerpt from CORRECT.LP :

SPACE-GROUP         UNIT CELL CONSTANTS            UNIQUE   Rmeas  COMPARED  LATTICE-
  NUMBER      a      b      c   alpha beta gamma                            CHARACTER

      5     145.8  145.7  131.4  90.0  90.0  90.0    9735    24.5    23176    10 mC
     75     103.1  103.1  131.4  90.0  90.0  90.0    5262    23.4    27649    11 tP
     89     103.1  103.1  131.4  90.0  90.0  90.0    2911    22.8    30000    11 tP
     21     145.7  145.8  131.4  90.0  90.0  90.0    5270    23.2    27641    13 oC
      5     145.7  145.8  131.4  90.0  90.0  90.0    9681    24.2    23230    14 mC
      1     102.9  103.2  131.4  90.0  90.0  89.9   18040     6.9    14871    31 aP
  *  16     102.9  103.2  131.4  90.0  90.0  90.0    5568     9.1    27343    32 oP
      3     103.2  102.9  131.4  90.0  90.0  90.0   10536     9.5    22375    35 mP
      3     102.9  103.2  131.4  90.0  90.0  90.0   10496     8.3    22415    33 mP
      3     102.9  131.4  103.2  90.0  90.1  90.0    9770     7.3    23141    34 mP
      1     102.9  103.2  131.4  90.0  90.0  90.1   18040     6.9    14871    44 aP

...

REFINED PARAMETERS:  DISTANCE BEAM ORIENTATION CELL AXIS                   
USING  219412 INDEXED SPOTS
STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     1.01
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.11
CRYSTAL MOSAICITY (DEGREES)     0.191
DIRECT BEAM COORDINATES (REC. ANGSTROEM)  -0.004789  0.003758  1.021015
DETECTOR COORDINATES (PIXELS) OF DIRECT BEAM    1027.25   1064.20
DETECTOR ORIGIN (PIXELS) AT                     1036.84   1056.68
CRYSTAL TO DETECTOR DISTANCE (mm)       209.38
LAB COORDINATES OF DETECTOR X-AXIS  1.000000  0.000000  0.000000
LAB COORDINATES OF DETECTOR Y-AXIS  0.000000  1.000000  0.000000
LAB COORDINATES OF ROTATION AXIS  0.999997  0.000527  0.002187
COORDINATES OF UNIT CELL A-AXIS    21.922    52.895    85.337
COORDINATES OF UNIT CELL B-AXIS     3.771    87.158   -54.992
COORDINATES OF UNIT CELL C-AXIS  -128.130    18.914    21.191
REC. CELL PARAMETERS   0.009731  0.009697  0.007620  90.000  90.000  90.000
UNIT CELL PARAMETERS    102.766   103.125   131.241  90.000  90.000  90.000
E.S.D. OF CELL PARAMETERS  1.3E-01 8.6E-02 9.3E-02 0.0E+00 0.0E+00 0.0E+00
SPACE GROUP NUMBER     16

So CORRECT chooses an orthorhombic spacegroup.

The file continues:

...
     a        b          ISa
6.058E+00  3.027E-04   23.35


...

      NOTE:      Friedel pairs are treated as different reflections.

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    6.23       17389    5807      6045       96.1%       2.4%      2.8%    17277   35.83     3.0%     2.0%    66%   1.553    2434
    4.43       32116   10536     10787       97.7%       2.7%      3.0%    32057   33.78     3.3%     2.4%    55%   1.272    4762
    3.62       41900   13700     13961       98.1%       3.4%      3.4%    41793   27.98     4.1%     3.6%    38%   1.115    6295
    3.14       51146   16371     16513       99.1%       5.4%      5.3%    50967   18.89     6.6%     7.2%    20%   0.961    7625
    2.81       59159   18627     18675       99.7%      12.7%     13.2%    58877    9.82    15.4%    18.0%     8%   0.818    8716
    2.56       65525   20596     20651       99.7%      28.5%     30.2%    65130    5.19    34.5%    40.4%     3%   0.757    9629
    2.37       71579   22491     22533       99.8%      62.6%     67.1%    71068    2.60    75.6%    88.8%     1%   0.694   10498
    2.22       74065   23837     24094       98.9%      97.9%     97.0%    73444    1.59   118.8%   139.8%    11%   0.738   11051
    2.09       65776   24379     25674       95.0%     133.3%    140.6%    63647    0.90   166.4%   216.0%     1%   0.651   10380
   total      478655  156344    158933       98.4%       6.5%      6.8%   474260   10.65     7.9%    22.5%    16%   0.852   71390


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  492346
NUMBER OF REJECTED MISFITS                           13342
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                  0
NUMBER OF ACCEPTED OBSERVATIONS                     479004
NUMBER OF UNIQUE ACCEPTED REFLECTIONS               157108

Some comments:

  • the "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" is significantly higher (1.01) than those reported for the 5°-batches in INTEGRATE.LP (about 0.6) . This suggests that the geometry refinement has to deal with inconsistent data.
  • CORRECT obviously indicates an orthorhombic spacegroup.
  • the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so this is a problem intrinsic to the data, and not to their mode of processing.

To my surprise, pointless does not agree with CORRECT's standpoint:

Scores for each symmetry element
 
Nelmt  Lklhd  Z-cc    CC        N  Rmeas    Symmetry & operator (in Lattice Cell)

  1   0.959   9.91   0.99   65030  0.034     identity
  2   0.959   9.91   0.99  132222  0.035 *** 2-fold l ( 0 0 1)  {-h,-k,+l}
  3   0.958   9.87   0.99  110073  0.044 *** 2-fold h ( 1 0 0)  {+h,-k,-l}
  4   0.942   9.55   0.96  132646  0.109 *** 2-fold   ( 1 1 0)  {+k,+h,-l}
  5   0.958   9.87   0.99  111819  0.043 *** 2-fold k ( 0 1 0)  {-h,+k,-l}
  6   0.941   9.54   0.95  131842  0.109 *** 2-fold   ( 1-1 0)  {-k,-h,-l}
  7   0.937   9.50   0.95  224393  0.107 *** 4-fold l ( 0 0 1)  {-k,+h,+l} {+k,-h,+l}

and

    Laue Group        Lklhd   NetZc  Zc+   Zc-    CC    CC-  Rmeas   R-  Delta ReindexOperator

> 1  P 4/m m m  ***  1.000   9.73  9.73  0.00   0.97  0.00   0.07  0.00   0.2 [h,k,l]
- 2    P m m m       0.000   0.35  9.88  9.53   0.99  0.95   0.04  0.11   0.0 [h,k,l]
  3    C m m m       0.000  -0.02  9.72  9.74   0.97  0.97   0.07  0.07   0.2 [h+k,-h+k,l]
  4      P 4/m       0.000   0.07  9.77  9.70   0.98  0.97   0.06  0.08   0.2 [h,k,l]
  5  P 1 2/m 1       0.000   0.25  9.91  9.66   0.99  0.97   0.03  0.08   0.0 [-h,-l,-k]
  6  P 1 2/m 1       0.000   0.22  9.89  9.67   0.99  0.97   0.04  0.08   0.0 [h,k,l]
  7  P 1 2/m 1       0.000   0.21  9.88  9.67   0.99  0.97   0.04  0.08   0.0 [-k,-h,-l]
  8  C 1 2/m 1       0.000  -0.01  9.72  9.73   0.97  0.97   0.07  0.07   0.2 [h-k,h+k,l]
  9  C 1 2/m 1       0.000  -0.02  9.71  9.73   0.97  0.97   0.07  0.07   0.2 [h+k,-h+k,l]
 10       P -1       0.000   0.21  9.91  9.70   0.99  0.97   0.03  0.08   0.0 [h,k,l]

and

   Spacegroup         TotProb SysAbsProb     Reindex         Conditions
 
   <P 41 21 2> ( 92)    0.823  0.823                         00l: l=4n, h00: h=2n (zones 1,2)
   <P 43 21 2> ( 96)    0.823  0.823                         00l: l=4n, h00: h=2n (zones 1,2)
    ..........
    <P 4 21 2> ( 90)    0.095  0.095                         h00: h=2n (zone 2)
    ..........
   <P 42 21 2> ( 94)    0.077  0.077                         00l: l=2n, h00: h=2n (zones 1,2)

Thus suggesting #92 or #96 - the latter of which agrees with the PDB deposition. However, running CORRECT in #96 and specifying 103 103 130 90 90 90 as cell parameters, we obtain:

REFINED PARAMETERS:  DISTANCE BEAM ORIENTATION CELL AXIS                   
USING  220320 INDEXED SPOTS
STANDARD DEVIATION OF SPOT    POSITION (PIXELS)     1.17
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES)    0.14
CRYSTAL MOSAICITY (DEGREES)     0.191
DIRECT BEAM COORDINATES (REC. ANGSTROEM)  -0.004790  0.004009  1.021014
DETECTOR COORDINATES (PIXELS) OF DIRECT BEAM    1027.19   1064.23
DETECTOR ORIGIN (PIXELS) AT                     1036.79   1056.20
CRYSTAL TO DETECTOR DISTANCE (mm)       209.52
LAB COORDINATES OF DETECTOR X-AXIS  1.000000  0.000000  0.000000
LAB COORDINATES OF DETECTOR Y-AXIS  0.000000  1.000000  0.000000
LAB COORDINATES OF ROTATION AXIS  0.999996  0.000901  0.002534
COORDINATES OF UNIT CELL A-AXIS    21.926    53.087    85.553
COORDINATES OF UNIT CELL B-AXIS     3.794    87.060   -54.995
COORDINATES OF UNIT CELL C-AXIS  -128.212    18.926    21.115
REC. CELL PARAMETERS   0.009704  0.009704  0.007616  90.000  90.000  90.000
UNIT CELL PARAMETERS    103.045   103.045   131.310  90.000  90.000  90.000
E.S.D. OF CELL PARAMETERS  2.1E-01 2.1E-01 2.1E-01 0.0E+00 0.0E+00 0.0E+00
SPACE GROUP NUMBER     96

...

    a        b          ISa
7.890E+00  8.793E-04   12.01

...

     NOTE:      Friedel pairs are treated as different reflections.

SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
  LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

    6.23       16770    2983      3017       98.9%       5.2%      6.1%    16752   26.20     5.7%     2.6%    55%   1.247    1223
    4.43       30598    5392      5393      100.0%       5.8%      6.2%    30596   25.25     6.3%     3.0%    50%   1.072    2420
    3.62       39822    6992      6994      100.0%       6.9%      6.6%    39820   22.27     7.6%     4.0%    32%   0.975    3215
    3.14       49620    8240      8242      100.0%       9.2%      8.7%    49619   17.14    10.1%     6.2%    19%   0.876    3847
    2.81       59388    9379      9379      100.0%      17.7%     18.1%    59387   10.44    19.3%    12.3%     0%   0.736    4410
    2.56       65652   10308     10310      100.0%      34.6%     39.1%    65652    6.08    37.7%    23.6%    -1%   0.680    4872
    2.37       71744   11258     11259      100.0%      71.3%     83.8%    71744    3.23    77.6%    52.1%    -2%   0.652    5352
    2.22       74888   12065     12082       99.9%     111.0%    116.9%    74888    1.98   121.2%    86.9%     2%   0.718    5753
    2.09       65727   12386     12874       96.2%     151.3%    176.1%    65517    1.12   168.0%   148.4%    -3%   0.631    5797
   total      474209   79003     79550       99.3%      10.3%     11.0%   473975    9.44    11.3%    17.2%    13%   0.772   36889


NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES  492346
NUMBER OF REJECTED MISFITS                           17898
NUMBER OF SYSTEMATIC ABSENT REFLECTIONS                141
NUMBER OF ACCEPTED OBSERVATIONS                     474307
NUMBER OF UNIQUE ACCEPTED REFLECTIONS                79022

which is much worse than the spacegroup 19 statistics (compare the ISa values - they differ by a factor of 2 !) so there may be something wrong with some assumptions we were making ...

Identifying a possible cause

The easiest thing one can do is to inspect INTEGRATE.LP - this lists scale factor, beam divergence and mosaicity for every reflection. There's a jiffy called "scalefactors" which grep's the relevant lines from INTEGRATE.LP ("scalefactors > scales.log"). This shows the scale factor (column 3):   demonstrating that "something happens" between frame 372 and 373 (of course one has to look at the table to find the exact numbers).

It should be noted that any abrupt change in conditions during the experiment is going to spoil the resulting data in one way or another. This is most true for a SAD experiment which is supposed to give accurate values for the tiny differences in intensities between Friedel-related reflections.

A solution

At this point it is good to look at the data for experiment E2. Here, we find exactly the same problems of bad ISa and high "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" when reducing frames 1-591 in one run of xds.

With this knowledge, we are lead, for E1, to reduce frames 1-372 and 373-592 separately, in spacegroup 96. For E2, we use frames 1-369 and 371-591, respectively. Frame E2-370 has a very high scale factor so we leave it out altogether.

This is also a good time to closely inspect the headers of the frames:

% grep --binary-files=text DATE j1603b3PK_1_E1_37?.img

gives

j1603b3PK_1_E1_370.img:DATE=Sun Jun 27 08:55:51 2004;
j1603b3PK_1_E1_371.img:DATE=Sun Jun 27 08:56:00 2004;
j1603b3PK_1_E1_372.img:DATE=Sun Jun 27 08:56:08 2004;
j1603b3PK_1_E1_373.img:DATE=Sun Jun 27 09:19:45 2004;
j1603b3PK_1_E1_374.img:DATE=Sun Jun 27 09:19:54 2004;
j1603b3PK_1_E1_375.img:DATE=Sun Jun 27 09:20:02 2004;
j1603b3PK_1_E1_376.img:DATE=Sun Jun 27 09:20:10 2004;
j1603b3PK_1_E1_377.img:DATE=Sun Jun 27 09:20:58 2004;
j1603b3PK_1_E1_378.img:DATE=Sun Jun 27 09:21:08 2004;
j1603b3PK_1_E1_379.img:DATE=Sun Jun 27 09:21:17 2004;

and

% grep --binary-files=text DATE j1603b3PK_1_E2_3[67]?.img

gives

j1603b3PK_1_E2_366.img:DATE=Sun Jun 27 08:55:15 2004;
j1603b3PK_1_E2_367.img:DATE=Sun Jun 27 08:55:23 2004;
j1603b3PK_1_E2_368.img:DATE=Sun Jun 27 08:55:32 2004;
j1603b3PK_1_E2_369.img:DATE=Sun Jun 27 08:56:19 2004;
j1603b3PK_1_E2_370.img:DATE=Sun Jun 27 08:56:28 2004;
j1603b3PK_1_E2_371.img:DATE=Sun Jun 27 09:19:26 2004;
j1603b3PK_1_E2_372.img:DATE=Sun Jun 27 09:19:34 2004;
j1603b3PK_1_E2_373.img:DATE=Sun Jun 27 09:20:22 2004;
j1603b3PK_1_E2_374.img:DATE=Sun Jun 27 09:20:30 2004;
j1603b3PK_1_E2_375.img:DATE=Sun Jun 27 09:20:38 2004;
j1603b3PK_1_E2_376.img:DATE=Sun Jun 27 09:20:47 2004;

thus proving that both datasets were interrupted for 20 minutes around frame 370.

The really weird thing here is that both datasets appear to be collected at the same time, but at different wavelengths (E1 at 0.9794 Å, E2 at 0.9184 Å), and yet the individual parts merge as follows: using the following XSCALE.INP:

UNIT_CELL_CONSTANTS=103.316   103.316   131.456  90.000  90.000  90.000
SPACE_GROUP_NUMBER=96
OUTPUT_FILE=temp.ahkl
INPUT_FILE=../e1_1-372/XDS_ASCII.HKL
INPUT_FILE=../e1_373-592/XDS_ASCII.HKL
INPUT_FILE=../e2_1-369/XDS_ASCII.HKL
INPUT_FILE=../e2_371-591/XDS_ASCII.HKL

and running xscale, we obtain in XSCALE.LP:

    CORRELATIONS BETWEEN INPUT DATA SETS AFTER CORRECTIONS

DATA SETS  NUMBER OF COMMON  CORRELATION   RATIO OF COMMON   B-FACTOR
 #i   #j     REFLECTIONS     BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j

   1    2       15943           0.978            1.0002         0.0106
   1    3       22366           1.000            1.0012        -0.0008
   2    3       15801           0.977            0.9983         0.0557
   1    4       15648           0.979            0.9988         0.0541
   2    4       14862           0.999            1.0024        -0.0007
   3    4       15524           0.978            0.9999        -0.0015

which means that e1_1-372 correlates well (1.000) with e2_1-369, and e1_373-59 well (0.999) with e2_371-591, but the crosswise correlations are consistently low (0.978, 0.977, 0.979, 0.978). The adjustment to the error model proves this:

    a        b          ISa    ISa0   INPUT DATA SET
6.112E+00  1.429E-03   10.70   22.37 ../e1_1-372/XDS_ASCII.HKL                         
1.074E+01  1.825E-03    7.14   23.79 ../e1_373-592/XDS_ASCII.HKL                       
5.707E+00  1.621E-03   10.40   22.82 ../e2_1-369/XDS_ASCII.HKL                         
8.547E+00  1.796E-03    8.07   24.17 ../e2_371-591/XDS_ASCII.HKL                       

telling us that "if we merge these datasets together, their error estimates have to be increased a lot". However, if we switch to

UNIT_CELL_CONSTANTS=103.316   103.316   131.456  90.000  90.000  90.000
SPACE_GROUP_NUMBER=96

OUTPUT_FILE=firstparts.ahkl
INPUT_FILE=../e1_1-372/XDS_ASCII.HKL
INPUT_FILE=../e2_1-369/XDS_ASCII.HKL

OUTPUT_FILE=secondparts.ahkl
INPUT_FILE=../e1_373-592/XDS_ASCII.HKL
INPUT_FILE=../e2_371-591/XDS_ASCII.HKL

we obtain

    a        b          ISa    ISa0   INPUT DATA SET
6.120E+00  3.673E-04   21.09   22.37 ../e1_1-372/XDS_ASCII.HKL                         
5.713E+00  3.819E-04   21.41   22.82 ../e2_1-369/XDS_ASCII.HKL                         
5.639E+00  3.151E-04   23.72   23.79 ../e1_373-592/XDS_ASCII.HKL                       
5.289E+00  3.258E-04   24.09   24.17 ../e2_371-591/XDS_ASCII.HKL                       

proving that the second parts of datasets E1 and E2 should be treated separately from the first parts.

Upon inspection of the cell parameters, we find that the cell axes of the second "halfs" are shorter by a factor of 0.9908 when compared with the first parts. This suggests that they were collected at a longer wavelength! But then the wavelength values in the headers are most likely completely wrong: we can speculate that the two first parts were collected at the SeMet peak wavelength, and the two second parts at the inflection wavelength.

The almost-simultaneous DATEs in the headers may be explained by an inverse-beam measuring strategy which alternatingly collects 4 frames in one orientation as E1, then rotates the spindle by 180° and collects 4 frames into E2. For some reason, the beamline software did not write the correct wavelength into the headers.

So this little detective work appears to tell us what happened in the morning of Sunday June 27, 2004 at ALS beamline 821.

Further analysis of datasets E1 and E2

Here, we try to learn more about the constituents of "firstparts".

Running "xdsstat > XDSSTAT.LP" in the e1_1-372 and e2_1-369 directories, we obtain statistics output not available from CORRECT. We open XDSSTAT.LP with the CCP4 program "loggraph", and take a look at misfits.pck, rf.pck, and the other files produced by xdsstat, using VIEW or XDS-Viewer:

  Reflections and misfits, by frame - looks normal   Intensity and sigma by frame - looks normal   "partiality" and profile agreement, by frame - looks good but it's clear that the profiles at high frame number agree worse with the average profiles, possibly due to radiation damage   R_meas, by frame, clearly showing good R_meas in the middle of the dataset.   R_d - an R-factor which directly depends on radiation damage. This is calculated as a function of frame number difference and the linear rise indicates significant radiation damage that should be correctable in XSCALE, using the CRYSTAL_NAME keyword.   misfits mapped on the detector, showing ice rings.   R_meas mapped on the detector, showing elevated R_meas at the location of the ice rings.

Solving the structure

Although we could now think of using these two files ("firstparts" and "secondparts" merged) and assume that they are peak and inflection wavelengths, it appears more reasonable to try and solve the structure with SAD - which means using "firstparts" only.

To make sure we haven't overlooked anything