1Y13
The structure is deposited in the PDB, solved with SAD and refined at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96). The data for this project were provided by Jürgen Bosch (SGPP) and are linked to the ACA 2011 workshop website. There are two high-resolution (2 Å) datasets E1 (wavelength 0.9794Å) and E2 (@ 0.9174Å) collected (with 0.25° increments) at an ALS beamline on June 27, 2004, and a weaker dataset collected earlier at a SSRL beamline. We will only use the former two datasets here.
Dataset E1
Use generate_XDS.INP and run xds once. Based on R-factors in the resulting CORRECT.LP, and an inspection of BKGPIX.cbf, I modified XDS.INP to have
INCLUDE_RESOLUTION_RANGE=40 2.1 ! too weak beyond 2.1 Å VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=8000. 30000. ! raised from 7000 30000 to mask beamstop
and ran xds again.
Identifying the problem
This is the excerpt from CORRECT.LP :
SPACE-GROUP UNIT CELL CONSTANTS UNIQUE Rmeas COMPARED LATTICE- NUMBER a b c alpha beta gamma CHARACTER 5 145.8 145.7 131.4 90.0 90.0 90.0 9735 24.5 23176 10 mC 75 103.1 103.1 131.4 90.0 90.0 90.0 5262 23.4 27649 11 tP 89 103.1 103.1 131.4 90.0 90.0 90.0 2911 22.8 30000 11 tP 21 145.7 145.8 131.4 90.0 90.0 90.0 5270 23.2 27641 13 oC 5 145.7 145.8 131.4 90.0 90.0 90.0 9681 24.2 23230 14 mC 1 102.9 103.2 131.4 90.0 90.0 89.9 18040 6.9 14871 31 aP * 16 102.9 103.2 131.4 90.0 90.0 90.0 5568 9.1 27343 32 oP 3 103.2 102.9 131.4 90.0 90.0 90.0 10536 9.5 22375 35 mP 3 102.9 103.2 131.4 90.0 90.0 90.0 10496 8.3 22415 33 mP 3 102.9 131.4 103.2 90.0 90.1 90.0 9770 7.3 23141 34 mP 1 102.9 103.2 131.4 90.0 90.0 90.1 18040 6.9 14871 44 aP ... REFINED PARAMETERS: DISTANCE BEAM ORIENTATION CELL AXIS USING 219412 INDEXED SPOTS STANDARD DEVIATION OF SPOT POSITION (PIXELS) 1.01 STANDARD DEVIATION OF SPINDLE POSITION (DEGREES) 0.11 CRYSTAL MOSAICITY (DEGREES) 0.191 DIRECT BEAM COORDINATES (REC. ANGSTROEM) -0.004789 0.003758 1.021015 DETECTOR COORDINATES (PIXELS) OF DIRECT BEAM 1027.25 1064.20 DETECTOR ORIGIN (PIXELS) AT 1036.84 1056.68 CRYSTAL TO DETECTOR DISTANCE (mm) 209.38 LAB COORDINATES OF DETECTOR X-AXIS 1.000000 0.000000 0.000000 LAB COORDINATES OF DETECTOR Y-AXIS 0.000000 1.000000 0.000000 LAB COORDINATES OF ROTATION AXIS 0.999997 0.000527 0.002187 COORDINATES OF UNIT CELL A-AXIS 21.922 52.895 85.337 COORDINATES OF UNIT CELL B-AXIS 3.771 87.158 -54.992 COORDINATES OF UNIT CELL C-AXIS -128.130 18.914 21.191 REC. CELL PARAMETERS 0.009731 0.009697 0.007620 90.000 90.000 90.000 UNIT CELL PARAMETERS 102.766 103.125 131.241 90.000 90.000 90.000 E.S.D. OF CELL PARAMETERS 1.3E-01 8.6E-02 9.3E-02 0.0E+00 0.0E+00 0.0E+00 SPACE GROUP NUMBER 16
So CORRECT chooses an orthorhombic spacegroup.
The file continues:
... a b ISa 6.058E+00 3.027E-04 23.35
... NOTE: Friedel pairs are treated as different reflections. SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 6.23 17389 5807 6045 96.1% 2.4% 2.8% 17277 35.83 3.0% 2.0% 66% 1.553 2434 4.43 32116 10536 10787 97.7% 2.7% 3.0% 32057 33.78 3.3% 2.4% 55% 1.272 4762 3.62 41900 13700 13961 98.1% 3.4% 3.4% 41793 27.98 4.1% 3.6% 38% 1.115 6295 3.14 51146 16371 16513 99.1% 5.4% 5.3% 50967 18.89 6.6% 7.2% 20% 0.961 7625 2.81 59159 18627 18675 99.7% 12.7% 13.2% 58877 9.82 15.4% 18.0% 8% 0.818 8716 2.56 65525 20596 20651 99.7% 28.5% 30.2% 65130 5.19 34.5% 40.4% 3% 0.757 9629 2.37 71579 22491 22533 99.8% 62.6% 67.1% 71068 2.60 75.6% 88.8% 1% 0.694 10498 2.22 74065 23837 24094 98.9% 97.9% 97.0% 73444 1.59 118.8% 139.8% 11% 0.738 11051 2.09 65776 24379 25674 95.0% 133.3% 140.6% 63647 0.90 166.4% 216.0% 1% 0.651 10380 total 478655 156344 158933 98.4% 6.5% 6.8% 474260 10.65 7.9% 22.5% 16% 0.852 71390 NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES 492346 NUMBER OF REJECTED MISFITS 13342 NUMBER OF SYSTEMATIC ABSENT REFLECTIONS 0 NUMBER OF ACCEPTED OBSERVATIONS 479004 NUMBER OF UNIQUE ACCEPTED REFLECTIONS 157108
Some comments:
- the "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" is significantly higher (1.01) than those reported for the 5°-batches in INTEGRATE.LP (about 0.6) . This suggests that the geometry refinement has to deal with inconsistent data.
- CORRECT obviously indicates an orthorhombic spacegroup.
- the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so this is a problem intrinsic to the data, and not to their mode of processing.
To my surprise, pointless does not agree with CORRECT's standpoint:
Scores for each symmetry element Nelmt Lklhd Z-cc CC N Rmeas Symmetry & operator (in Lattice Cell) 1 0.959 9.91 0.99 65030 0.034 identity 2 0.959 9.91 0.99 132222 0.035 *** 2-fold l ( 0 0 1) {-h,-k,+l} 3 0.958 9.87 0.99 110073 0.044 *** 2-fold h ( 1 0 0) {+h,-k,-l} 4 0.942 9.55 0.96 132646 0.109 *** 2-fold ( 1 1 0) {+k,+h,-l} 5 0.958 9.87 0.99 111819 0.043 *** 2-fold k ( 0 1 0) {-h,+k,-l} 6 0.941 9.54 0.95 131842 0.109 *** 2-fold ( 1-1 0) {-k,-h,-l} 7 0.937 9.50 0.95 224393 0.107 *** 4-fold l ( 0 0 1) {-k,+h,+l} {+k,-h,+l}
and
Laue Group Lklhd NetZc Zc+ Zc- CC CC- Rmeas R- Delta ReindexOperator > 1 P 4/m m m *** 1.000 9.73 9.73 0.00 0.97 0.00 0.07 0.00 0.2 [h,k,l] - 2 P m m m 0.000 0.35 9.88 9.53 0.99 0.95 0.04 0.11 0.0 [h,k,l] 3 C m m m 0.000 -0.02 9.72 9.74 0.97 0.97 0.07 0.07 0.2 [h+k,-h+k,l] 4 P 4/m 0.000 0.07 9.77 9.70 0.98 0.97 0.06 0.08 0.2 [h,k,l] 5 P 1 2/m 1 0.000 0.25 9.91 9.66 0.99 0.97 0.03 0.08 0.0 [-h,-l,-k] 6 P 1 2/m 1 0.000 0.22 9.89 9.67 0.99 0.97 0.04 0.08 0.0 [h,k,l] 7 P 1 2/m 1 0.000 0.21 9.88 9.67 0.99 0.97 0.04 0.08 0.0 [-k,-h,-l] 8 C 1 2/m 1 0.000 -0.01 9.72 9.73 0.97 0.97 0.07 0.07 0.2 [h-k,h+k,l] 9 C 1 2/m 1 0.000 -0.02 9.71 9.73 0.97 0.97 0.07 0.07 0.2 [h+k,-h+k,l] 10 P -1 0.000 0.21 9.91 9.70 0.99 0.97 0.03 0.08 0.0 [h,k,l]
and
Spacegroup TotProb SysAbsProb Reindex Conditions <P 41 21 2> ( 92) 0.823 0.823 00l: l=4n, h00: h=2n (zones 1,2) <P 43 21 2> ( 96) 0.823 0.823 00l: l=4n, h00: h=2n (zones 1,2) .......... <P 4 21 2> ( 90) 0.095 0.095 h00: h=2n (zone 2) .......... <P 42 21 2> ( 94) 0.077 0.077 00l: l=2n, h00: h=2n (zones 1,2)
Thus suggesting #92 or #96 - the latter of which agrees with the PDB deposition. However, running CORRECT in #96 and specifying 103 103 130 90 90 90 as cell parameters, we obtain:
REFINED PARAMETERS: DISTANCE BEAM ORIENTATION CELL AXIS USING 220320 INDEXED SPOTS STANDARD DEVIATION OF SPOT POSITION (PIXELS) 1.17 STANDARD DEVIATION OF SPINDLE POSITION (DEGREES) 0.14 CRYSTAL MOSAICITY (DEGREES) 0.191 DIRECT BEAM COORDINATES (REC. ANGSTROEM) -0.004790 0.004009 1.021014 DETECTOR COORDINATES (PIXELS) OF DIRECT BEAM 1027.19 1064.23 DETECTOR ORIGIN (PIXELS) AT 1036.79 1056.20 CRYSTAL TO DETECTOR DISTANCE (mm) 209.52 LAB COORDINATES OF DETECTOR X-AXIS 1.000000 0.000000 0.000000 LAB COORDINATES OF DETECTOR Y-AXIS 0.000000 1.000000 0.000000 LAB COORDINATES OF ROTATION AXIS 0.999996 0.000901 0.002534 COORDINATES OF UNIT CELL A-AXIS 21.926 53.087 85.553 COORDINATES OF UNIT CELL B-AXIS 3.794 87.060 -54.995 COORDINATES OF UNIT CELL C-AXIS -128.212 18.926 21.115 REC. CELL PARAMETERS 0.009704 0.009704 0.007616 90.000 90.000 90.000 UNIT CELL PARAMETERS 103.045 103.045 131.310 90.000 90.000 90.000 E.S.D. OF CELL PARAMETERS 2.1E-01 2.1E-01 2.1E-01 0.0E+00 0.0E+00 0.0E+00 SPACE GROUP NUMBER 96 ... a b ISa 7.890E+00 8.793E-04 12.01 ... NOTE: Friedel pairs are treated as different reflections. SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 6.23 16770 2983 3017 98.9% 5.2% 6.1% 16752 26.20 5.7% 2.6% 55% 1.247 1223 4.43 30598 5392 5393 100.0% 5.8% 6.2% 30596 25.25 6.3% 3.0% 50% 1.072 2420 3.62 39822 6992 6994 100.0% 6.9% 6.6% 39820 22.27 7.6% 4.0% 32% 0.975 3215 3.14 49620 8240 8242 100.0% 9.2% 8.7% 49619 17.14 10.1% 6.2% 19% 0.876 3847 2.81 59388 9379 9379 100.0% 17.7% 18.1% 59387 10.44 19.3% 12.3% 0% 0.736 4410 2.56 65652 10308 10310 100.0% 34.6% 39.1% 65652 6.08 37.7% 23.6% -1% 0.680 4872 2.37 71744 11258 11259 100.0% 71.3% 83.8% 71744 3.23 77.6% 52.1% -2% 0.652 5352 2.22 74888 12065 12082 99.9% 111.0% 116.9% 74888 1.98 121.2% 86.9% 2% 0.718 5753 2.09 65727 12386 12874 96.2% 151.3% 176.1% 65517 1.12 168.0% 148.4% -3% 0.631 5797 total 474209 79003 79550 99.3% 10.3% 11.0% 473975 9.44 11.3% 17.2% 13% 0.772 36889 NUMBER OF REFLECTIONS IN SELECTED SUBSET OF IMAGES 492346 NUMBER OF REJECTED MISFITS 17898 NUMBER OF SYSTEMATIC ABSENT REFLECTIONS 141 NUMBER OF ACCEPTED OBSERVATIONS 474307 NUMBER OF UNIQUE ACCEPTED REFLECTIONS 79022
which is much worse than the spacegroup 19 statistics (compare the ISa values - they differ by a factor of 2 !) so there may be something wrong with some assumptions we were making ...
Identifying a possible cause of the problem
The easiest thing one can do is to inspect INTEGRATE.LP - this lists scale factor, beam divergence and mosaicity for every reflection. There's a jiffy called "scalefactors" which grep's the relevant lines from INTEGRATE.LP ("scalefactors > scales.log"). This shows the scale factor (column 3): demonstrating that "something happens" between frame 372 and 373 (of course one has to look at the table to find the exact numbers).
It should be noted that any abrupt change in conditions during the experiment is going to spoil the resulting data in one way or another. This is most true for a SAD experiment which is supposed to give accurate values for the tiny differences in intensities between Friedel-related reflections.
Solving the problem
At this point it is good to look at the data for experiment E2. We find exactly the same problem of bad ISa and high "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" when reducing frames 1-591 in one run of xds.
With this knowledge, we are lead, for E1, to reduce frames 1-372 and 373-592 separately, in spacegroup 96. For E2, we use frames 1-369 and 371-591, respectively. Frame E2-370 has a very high scalefactor.
This is also a good time to closely inspect the headers of the frames:
% grep --binary-files=text DATE ALS/821/1y13/j1603b3PK_1_E1_37?.img
gives
ALS/821/1y13/j1603b3PK_1_E1_370.img:DATE=Sun Jun 27 08:55:51 2004; ALS/821/1y13/j1603b3PK_1_E1_371.img:DATE=Sun Jun 27 08:56:00 2004; ALS/821/1y13/j1603b3PK_1_E1_372.img:DATE=Sun Jun 27 08:56:08 2004; ALS/821/1y13/j1603b3PK_1_E1_373.img:DATE=Sun Jun 27 09:19:45 2004; ALS/821/1y13/j1603b3PK_1_E1_374.img:DATE=Sun Jun 27 09:19:54 2004; ALS/821/1y13/j1603b3PK_1_E1_375.img:DATE=Sun Jun 27 09:20:02 2004; ALS/821/1y13/j1603b3PK_1_E1_376.img:DATE=Sun Jun 27 09:20:10 2004; ALS/821/1y13/j1603b3PK_1_E1_377.img:DATE=Sun Jun 27 09:20:58 2004; ALS/821/1y13/j1603b3PK_1_E1_378.img:DATE=Sun Jun 27 09:21:08 2004; ALS/821/1y13/j1603b3PK_1_E1_379.img:DATE=Sun Jun 27 09:21:17 2004;
and
% grep --binary-files=text DATE ALS/821/1y13/j1603b3PK_1_E2_3[67]?.img
gives
ALS/821/1y13/j1603b3PK_1_E2_366.img:DATE=Sun Jun 27 08:55:15 2004; ALS/821/1y13/j1603b3PK_1_E2_367.img:DATE=Sun Jun 27 08:55:23 2004; ALS/821/1y13/j1603b3PK_1_E2_368.img:DATE=Sun Jun 27 08:55:32 2004; ALS/821/1y13/j1603b3PK_1_E2_369.img:DATE=Sun Jun 27 08:56:19 2004; ALS/821/1y13/j1603b3PK_1_E2_370.img:DATE=Sun Jun 27 08:56:28 2004; ALS/821/1y13/j1603b3PK_1_E2_371.img:DATE=Sun Jun 27 09:19:26 2004; ALS/821/1y13/j1603b3PK_1_E2_372.img:DATE=Sun Jun 27 09:19:34 2004; ALS/821/1y13/j1603b3PK_1_E2_373.img:DATE=Sun Jun 27 09:20:22 2004; ALS/821/1y13/j1603b3PK_1_E2_374.img:DATE=Sun Jun 27 09:20:30 2004; ALS/821/1y13/j1603b3PK_1_E2_375.img:DATE=Sun Jun 27 09:20:38 2004; ALS/821/1y13/j1603b3PK_1_E2_376.img:DATE=Sun Jun 27 09:20:47 2004;
thus proving that both datasets were interrupted for 20 minutes around frame 370.
The really weird thing here is that both datasets appear to be collected at the same time, but at different wavelengths (E1 at 0.9794 Å, E2 at 0.9184 Å), and yet the individual parts merge as follows: using the following XSCALE.INP:
UNIT_CELL_CONSTANTS=103.316 103.316 131.456 90.000 90.000 90.000 SPACE_GROUP_NUMBER=96 OUTPUT_FILE=temp.ahkl INPUT_FILE=../e1_1-372/XDS_ASCII.HKL INPUT_FILE=../e1_373-592/XDS_ASCII.HKL INPUT_FILE=../e2_1-369/XDS_ASCII.HKL INPUT_FILE=../e2_371-591/XDS_ASCII.HKL
and running xscale, we obtain in XSCALE.LP:
CORRELATIONS BETWEEN INPUT DATA SETS AFTER CORRECTIONS DATA SETS NUMBER OF COMMON CORRELATION RATIO OF COMMON B-FACTOR #i #j REFLECTIONS BETWEEN i,j INTENSITIES (i/j) BETWEEN i,j 1 2 15943 0.978 1.0002 0.0106 1 3 22366 1.000 1.0012 -0.0008 2 3 15801 0.977 0.9983 0.0557 1 4 15648 0.979 0.9988 0.0541 2 4 14862 0.999 1.0024 -0.0007 3 4 15524 0.978 0.9999 -0.0015
which means that e1_1-372 correlates well (1.000) with e2_1-369, and e1_373-59 well (0.999) with e2_371-591, but the crosswise correlations are consistently low (0.978, 0.977, 0.979, 0.978). The adjustment to the error model proves this:
a b ISa ISa0 INPUT DATA SET 6.112E+00 1.429E-03 10.70 22.37 ../e1_1-372/XDS_ASCII.HKL 1.074E+01 1.825E-03 7.14 23.79 ../e1_373-592/XDS_ASCII.HKL 5.707E+00 1.621E-03 10.40 22.82 ../e2_1-369/XDS_ASCII.HKL 8.547E+00 1.796E-03 8.07 24.17 ../e2_371-591/XDS_ASCII.HKL
telling us that "if we merge these datasets together, their error estimates have to be increased a lot". However, if we switch to
UNIT_CELL_CONSTANTS=103.316 103.316 131.456 90.000 90.000 90.000 SPACE_GROUP_NUMBER=96 OUTPUT_FILE=firstparts.ahkl INPUT_FILE=../e1_1-372/XDS_ASCII.HKL INPUT_FILE=../e2_1-369/XDS_ASCII.HKL OUTPUT_FILE=secondparts.ahkl INPUT_FILE=../e1_373-592/XDS_ASCII.HKL INPUT_FILE=../e2_371-591/XDS_ASCII.HKL
we obtain
a b ISa ISa0 INPUT DATA SET 6.120E+00 3.673E-04 21.09 22.37 ../e1_1-372/XDS_ASCII.HKL 5.713E+00 3.819E-04 21.41 22.82 ../e2_1-369/XDS_ASCII.HKL 5.639E+00 3.151E-04 23.72 23.79 ../e1_373-592/XDS_ASCII.HKL 5.289E+00 3.258E-04 24.09 24.17 ../e2_371-591/XDS_ASCII.HKL
proving that the second parts of datasets E1 and E2 should be treated separately from the first parts.
Upon inspection of the cell parameters, we find that the cell axes of the second "halfs" are shorter by a factor of 0.9908 when compared with the first parts. This suggests that they were collected at a longer wavelength! But then the wavelength values in the headers are most likely completely wrong: we can speculate that the two first parts were collected at the SeMet peak wavelength, and the two second parts at the inflection wavelength.
Although