1Y13: Difference between revisions

6,396 bytes added ,  17 March 2011
no edit summary
No edit summary
Line 1: Line 1:
The data for this project were provided by Jürgen Bosch (SGPP) and are linked to [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTP-website/index.html the ACA 2011 workshop website]. The structure is [http://www.rcsb.org/pdb/explore/explore.do?structureId=1Y13 deposited] in the PDB, solved with SAD at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96).
The structure is [http://www.rcsb.org/pdb/explore/explore.do?structureId=1Y13 deposited] in the PDB, solved with SAD and refined at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96).
The data for this project were provided by Jürgen Bosch (SGPP) and are linked to [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTP-website/index.html the ACA 2011 workshop website].
There are two high-resolution (2 Å) datasets E1 (wavelength 0.9794Å) and E2 (@ 0.9174Å) collected (with 0.25° increments) at an ALS beamline on June 27, 2004, and a weaker dataset collected earlier at a SSRL beamline. We will only use the former two datasets here.


== dataset E1 ==
== Dataset E1 ==


Use [[generate_XDS.INP]] and run xds once. Based on R-factors in the resulting CORRECT.LP, and an inspection of BKGPIX.cbf, I modified  XDS.INP to have
Use [[generate_XDS.INP]] and run [[xds]] once. Based on R-factors in the resulting CORRECT.LP, and an inspection of BKGPIX.cbf, I modified  XDS.INP to have
  INCLUDE_RESOLUTION_RANGE=40 2.1                      ! too weak beyond 2.1
  INCLUDE_RESOLUTION_RANGE=40 2.1                      ! too weak beyond 2.1 Å
  VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=8000. 30000.  ! raised from 7000 30000 to mask beamstop
  VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=8000. 30000.  ! raised from 7000 30000 to mask beamstop
and ran xds again. This is the excerpt from CORRECT.LP :
and ran xds again.  
 
=== Identifying the problem ===
 
This is the excerpt from [[CORRECT.LP]] :
   
   
  SPACE-GROUP        UNIT CELL CONSTANTS            UNIQUE  Rmeas  COMPARED  LATTICE-
  SPACE-GROUP        UNIT CELL CONSTANTS            UNIQUE  Rmeas  COMPARED  LATTICE-
Line 45: Line 51:
  SPACE GROUP NUMBER    16
  SPACE GROUP NUMBER    16


So CORRECT chooses an orthorhombic spacegroup. Please note that the "UNIT CELL PARAMETERS" and the "E.S.D. OF CELL PARAMETERS" lines give a difference between the a and the b axis of 0.359 A, whereas the sum of their E.S.D.s is only 0.13+0.086 = 0.22 A.
So CORRECT chooses an orthorhombic spacegroup.  


The file continues:  
The file continues:  
Line 81: Line 87:


Some comments:
Some comments:
* the "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" is significantly higher (1.01) than those reported for the 5°-batches in INTEGRATE.LP (about 0.6) . This suggests that the geometry refinement has to deal with inconsistent data.
* CORRECT obviously indicates an orthorhombic spacegroup.  
* CORRECT obviously indicates an orthorhombic spacegroup.  
* the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so nothing to worry about.  
* the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so this is a problem intrinsic to the data, and not to their mode of processing.  


To my surprise, pointless does not agree with CORRECT's standpoint:
To my surprise, pointless does not agree with CORRECT's standpoint:
Line 179: Line 186:
which is much worse than the spacegroup 19 statistics (compare the ISa values - they differ by a factor of 2 !) so there may be something wrong with some assumptions we were making ...
which is much worse than the spacegroup 19 statistics (compare the ISa values - they differ by a factor of 2 !) so there may be something wrong with some assumptions we were making ...


The easiest thing one can do is to inspect INTEGRATE.LP - this lists scale factor, beam divergence and mosaicity for every reflection. There's a [[jiffies|jiffy]] called "scalefactors" which grep's the relevant lines from INTEGRATE.LP. This shows the scale factor (column 3):
=== Identifying a possible cause of the problem ===
 
The easiest thing one can do is to inspect INTEGRATE.LP - this lists scale factor, beam divergence and mosaicity for every reflection. There's a [[jiffies|jiffy]] called "scalefactors" which grep's the relevant lines from INTEGRATE.LP ("scalefactors > scales.log"). This shows the scale factor (column 3):
[[File:1y13-e1-scales.png]]
[[File:1y13-e1-scales.png]]
demonstrating that "something happens" between frame 372 and 373 (of course one has to look at the table to find the exact numbers).
'''It should be noted that any abrupt change in conditions during the experiment is going to spoil the resulting data in one way or another. This is most true for a SAD experiment which is supposed to give accurate values for the tiny differences in intensities between Friedel-related reflections.'''
=== Solving the problem ===
At this point it is good to look at the data for experiment E2. We find exactly the same problem of bad ISa and high "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" when reducing frames 1-591 in one run of xds.
With this knowledge, we are lead, for E1, to reduce frames 1-372 and 373-592 separately, in spacegroup 96. For E2, we use frames 1-369 and 371-591, respectively. Frame E2-370 has a very high scalefactor.
This is also a good time to closely inspect the headers of the frames:
% grep --binary-files=text DATE ALS/821/1y13/j1603b3PK_1_E1_37?.img
gives
ALS/821/1y13/j1603b3PK_1_E1_370.img:DATE=Sun Jun 27 08:55:51 2004;
ALS/821/1y13/j1603b3PK_1_E1_371.img:DATE=Sun Jun 27 08:56:00 2004;
ALS/821/1y13/j1603b3PK_1_E1_372.img:DATE=Sun Jun 27 08:56:08 2004;
ALS/821/1y13/j1603b3PK_1_E1_373.img:DATE=Sun Jun 27 09:19:45 2004;
ALS/821/1y13/j1603b3PK_1_E1_374.img:DATE=Sun Jun 27 09:19:54 2004;
ALS/821/1y13/j1603b3PK_1_E1_375.img:DATE=Sun Jun 27 09:20:02 2004;
ALS/821/1y13/j1603b3PK_1_E1_376.img:DATE=Sun Jun 27 09:20:10 2004;
ALS/821/1y13/j1603b3PK_1_E1_377.img:DATE=Sun Jun 27 09:20:58 2004;
ALS/821/1y13/j1603b3PK_1_E1_378.img:DATE=Sun Jun 27 09:21:08 2004;
ALS/821/1y13/j1603b3PK_1_E1_379.img:DATE=Sun Jun 27 09:21:17 2004;
and
% grep --binary-files=text DATE ALS/821/1y13/j1603b3PK_1_E2_3[67]?.img
gives
ALS/821/1y13/j1603b3PK_1_E2_366.img:DATE=Sun Jun 27 08:55:15 2004;
ALS/821/1y13/j1603b3PK_1_E2_367.img:DATE=Sun Jun 27 08:55:23 2004;
ALS/821/1y13/j1603b3PK_1_E2_368.img:DATE=Sun Jun 27 08:55:32 2004;
ALS/821/1y13/j1603b3PK_1_E2_369.img:DATE=Sun Jun 27 08:56:19 2004;
ALS/821/1y13/j1603b3PK_1_E2_370.img:DATE=Sun Jun 27 08:56:28 2004;
ALS/821/1y13/j1603b3PK_1_E2_371.img:DATE=Sun Jun 27 09:19:26 2004;
ALS/821/1y13/j1603b3PK_1_E2_372.img:DATE=Sun Jun 27 09:19:34 2004;
ALS/821/1y13/j1603b3PK_1_E2_373.img:DATE=Sun Jun 27 09:20:22 2004;
ALS/821/1y13/j1603b3PK_1_E2_374.img:DATE=Sun Jun 27 09:20:30 2004;
ALS/821/1y13/j1603b3PK_1_E2_375.img:DATE=Sun Jun 27 09:20:38 2004;
ALS/821/1y13/j1603b3PK_1_E2_376.img:DATE=Sun Jun 27 09:20:47 2004;
thus proving that both datasets were interrupted for 20 minutes around frame 370.
The really weird thing here is that both datasets appear to be collected at the same time, but at different wavelengths (E1 at 0.9794 Å, E2 at 0.9184 Å), and yet the individual parts merge as follows: using the following [[XSCALE.INP]]:
UNIT_CELL_CONSTANTS=103.316  103.316  131.456  90.000  90.000  90.000
SPACE_GROUP_NUMBER=96
OUTPUT_FILE=temp.ahkl
INPUT_FILE=../e1_1-372/XDS_ASCII.HKL
INPUT_FILE=../e1_373-592/XDS_ASCII.HKL
INPUT_FILE=../e2_1-369/XDS_ASCII.HKL
INPUT_FILE=../e2_371-591/XDS_ASCII.HKL
and running [[xscale]], we obtain in XSCALE.LP:
    CORRELATIONS BETWEEN INPUT DATA SETS AFTER CORRECTIONS
DATA SETS  NUMBER OF COMMON  CORRELATION  RATIO OF COMMON  B-FACTOR
  #i  #j    REFLECTIONS    BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j
    1    2      15943          0.978            1.0002        0.0106
    1    3      22366          1.000            1.0012        -0.0008
    2    3      15801          0.977            0.9983        0.0557
    1    4      15648          0.979            0.9988        0.0541
    2    4      14862          0.999            1.0024        -0.0007
    3    4      15524          0.978            0.9999        -0.0015
which means that e1_1-372 correlates well (1.000) with e2_1-369, and e1_373-59 well (0.999) with e2_371-591, but the crosswise correlations are consistently low (0.978, 0.977, 0.979, 0.978). The adjustment to the error model proves this:
    a        b          ISa    ISa0  INPUT DATA SET
6.112E+00  1.429E-03  10.70  22.37 ../e1_1-372/XDS_ASCII.HKL                       
1.074E+01  1.825E-03    7.14  23.79 ../e1_373-592/XDS_ASCII.HKL                     
5.707E+00  1.621E-03  10.40  22.82 ../e2_1-369/XDS_ASCII.HKL                       
8.547E+00  1.796E-03    8.07  24.17 ../e2_371-591/XDS_ASCII.HKL                     
telling us that "if we merge these datasets together, their error estimates have to be increased a lot". However, if we switch to
UNIT_CELL_CONSTANTS=103.316  103.316  131.456  90.000  90.000  90.000
SPACE_GROUP_NUMBER=96
OUTPUT_FILE=firstparts.ahkl
INPUT_FILE=../e1_1-372/XDS_ASCII.HKL
INPUT_FILE=../e2_1-369/XDS_ASCII.HKL
OUTPUT_FILE=secondparts.ahkl
INPUT_FILE=../e1_373-592/XDS_ASCII.HKL
INPUT_FILE=../e2_371-591/XDS_ASCII.HKL
we obtain
    a        b          ISa    ISa0  INPUT DATA SET
6.120E+00  3.673E-04  21.09  22.37 ../e1_1-372/XDS_ASCII.HKL                       
5.713E+00  3.819E-04  21.41  22.82 ../e2_1-369/XDS_ASCII.HKL                       
5.639E+00  3.151E-04  23.72  23.79 ../e1_373-592/XDS_ASCII.HKL                     
5.289E+00  3.258E-04  24.09  24.17 ../e2_371-591/XDS_ASCII.HKL                     
proving that the second parts of datasets E1 and E2 should be treated separately from the first parts.
Upon inspection of the cell parameters, we find that the cell axes of the second "halfs" are shorter by a factor of 0.9908 when compared with the first parts. This suggests that they were collected at a longer wavelength! But then the wavelength values in the headers are most likely completely wrong: we can speculate that the two first parts were collected at the SeMet peak wavelength, and the two second parts at the inflection wavelength.
Although
2,652

edits