1Y13: Difference between revisions

268 bytes added ,  24 March 2020
m
no edit summary
mNo edit summary
 
(19 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The structure is [http://www.rcsb.org/pdb/explore/explore.do?structureId=1Y13 deposited] in the PDB, solved with SAD and refined at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96).
The structure is [http://www.rcsb.org/pdb/explore/explore.do?structureId=1Y13 deposited] in the PDB, solved with SAD and refined at a resolution of 2.2 A in spacegroup P4(3)2(1)2 (#96).
The data for this project were provided by Jürgen Bosch (SGPP) and are linked to [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTP-website/index.html the ACA 2011 workshop website].  
The data for this project were provided by Jürgen Bosch (SGPP) and are linked to [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTP-website/index.html the ACA 2011 workshop website] and [https://{{SERVERNAME}}/pub/xds-datared/1y13/ here].  
There are two high-resolution (2 Å) datasets E1 (wavelength 0.9794Å) and E2 (@ 0.9174Å) collected (with 0.25° increments) at an ALS beamline on June 27, 2004, and a weaker dataset collected earlier at a SSRL beamline. We will only use the former two datasets here.
There are two high-resolution (2 Å) datasets E1 (wavelength 0.9794Å) and E2 (@ 0.9174Å) collected (with 0.25° increments) at an ALS beamline on June 27, 2004, and a weaker dataset collected earlier at a SSRL beamline. We will only use the former two datasets here.


Line 58: Line 58:
       a        b          ISa
       a        b          ISa
  6.058E+00  3.027E-04  23.35
  6.058E+00  3.027E-04  23.35
 
 
  ...
  ...
   
   
Line 91: Line 90:
* the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so this is a problem intrinsic to the data, and not to their mode of processing.  
* the number of MISFITS is higher than 1%. From the first long table (fine-grained in resolution) table in CORRECT.LP we learn that the misfits are due to faint high-resolution ice rings - so this is a problem intrinsic to the data, and not to their mode of processing.  


To my surprise, pointless does not agree with CORRECT's standpoint:
To my surprise, pointless ("pointless xdsin XDS_ASCII.HKL") does not agree with CORRECT's standpoint:
<pre>
<pre>
Scores for each symmetry element
Scores for each symmetry element
Line 230: Line 229:
thus proving that both datasets were interrupted for 20 minutes around frame 370.
thus proving that both datasets were interrupted for 20 minutes around frame 370.


The really weird thing here is that both datasets appear to be collected at the same time, but at different wavelengths (E1 at 0.9794 Å, E2 at 0.9184 Å), and yet the individual parts merge as follows: using the following [[XSCALE.INP]]:
Interestingly, both datasets appear to be collected at the same time, but at different wavelengths (E1 at 0.9794 Å, E2 at 0.9184 Å), and yet the individual parts merge as follows: using the following XSCALE.INP:
  UNIT_CELL_CONSTANTS=103.316  103.316  131.456  90.000  90.000  90.000
  UNIT_CELL_CONSTANTS=103.316  103.316  131.456  90.000  90.000  90.000
  SPACE_GROUP_NUMBER=96
  SPACE_GROUP_NUMBER=96
Line 278: Line 277:
proving that the second parts of datasets E1 and E2 should be treated separately from the first parts.
proving that the second parts of datasets E1 and E2 should be treated separately from the first parts.


Upon inspection of the cell parameters, we find that the cell axes of the second "halfs" are shorter by a factor of 0.9908 when compared with the first parts. This suggests that they were collected at a longer wavelength! But then the wavelength values in the headers are most likely completely wrong: we can speculate that the two first parts were collected at the SeMet peak wavelength, and the two second parts at the inflection wavelength.  
Upon inspection of the cell parameters, we find that the cell axes of the second "halfs" are shorter by a factor of 0.9908 when compared with the first parts. This suggests that they were collected at a longer wavelength, or that radiation damage changed the cell parameters during the 20-minute break - usually it makes them longer (Ravelli ''et al.'' (2002), J. Synchrotron Rad. 9, 355-360), but this may be the exception to the rule! Maybe the crystal even was exposed to the beam during that time, in an attempt to try radiation-damage induced phasing (see e.g. Ravelli ''et al'' Structure 11 (2003), 217-220).


The almost-simultaneous DATEs in the headers may be explained by an inverse-beam measuring strategy which alternatingly collects 4 frames in one orientation as E1, then rotates the spindle by 180° and collects 4 frames into E2. For some reason, the beamline software did not write the correct wavelength into the headers.
The almost-simultaneous DATEs in the headers may be explained by a wavelength-switching measuring strategy which alternatingly collects 4 frames at one wavelength as E1, then changes the wavelength and collects 4 frames into E2.


So this little detective work appears to tell us what happened in the morning of Sunday June 27, 2004 at ALS beamline 821.
So this little detective work appears to give us useful information about what happened in the morning of Sunday June 27, 2004 at ALS beamline 821 - but some questions remain.


== Further analysis of datasets E1 and E2 ==
== Further analysis of datasets E1 and E2 ==
Line 318: Line 317:
R_meas mapped on the detector, showing elevated R_meas at the location of the ice rings.
R_meas mapped on the detector, showing elevated R_meas at the location of the ice rings.


== Solving the structure ==
== Solving the structure with pseudo-SAD ==


Although we could now think of using these two files ("firstparts" and "secondparts" merged) and assume that they are peak and inflection wavelengths, it appears more reasonable to try and solve the structure with SAD - which means using "firstparts" only.
It appears reasonable to discard the "second parts" since they are strongly influenced by radiation damage. Then, we could  
# merge together (into one output file) the two first parts of E1 and E2, thus obtaining a single pseudo-SAD dataset. The reason for doing this is that the anomalous signal of both datasets is so strong, and their (isomorphous) difference is weak (after all, the correlation coefficient is 1.000 !)
# keep the first parts of E1 (inflection, according to the documentation) and E2 (high-enery remote) separate, and treat them as MAD (or rather, DAD).


=== First try ===
=== First try ===
Let's look at the XSCALE statistics for "firstparts":
Let's look at the XSCALE statistics for the merged-together "firstparts":


       NOTE:      Friedel pairs are treated as different reflections.
       NOTE:      Friedel pairs are treated as different reflections.
Line 365: Line 366:
This looks reasonable although the absolute value of CCall is so low that there is little hope that the structure can be solved with this amount of information. And indeed, SHELXE did not show a difference between the two hands (in fact we even know that the "original hand" is the correct one since the inverted had would correspond to spacegroup #92 !).
This looks reasonable although the absolute value of CCall is so low that there is little hope that the structure can be solved with this amount of information. And indeed, SHELXE did not show a difference between the two hands (in fact we even know that the "original hand" is the correct one since the inverted had would correspond to spacegroup #92 !).


=== Second try: correcting radiation damage at the level of individual reflections ===
=== Second try: correcting radiation damage by 0-dose extrapolation ===


Since we noted significant radiation damage we could try to correct that. All we have to do is ask XSCALE to do zero-dose extrapolation:
Since we noted significant radiation damage we could try to correct that. All we have to do is ask XSCALE to do zero-dose extrapolation:
Line 378: Line 379:
CRYSTAL_NAME=a
CRYSTAL_NAME=a
</pre>
</pre>
As a result we obtain:
As a result we obtain in XSCALE.LP:
<pre>
<pre>


Line 580: Line 581:
</pre>
</pre>


We not that the "CORRELATION OF COMMON DECAY-FACTORS BETWEEN INPUT DATA SETS" are really high which confirms the hypothesis that this is a valid procedure to perform.
We note that the "CORRELATION OF COMMON DECAY-FACTORS BETWEEN INPUT DATA SETS" are really high which confirms the hypothesis that this is a valid procedure to perform.


Comparison of the last table with that of the previous paragraph, i.e. without zero-dose extrapolation, shows that the I/sigma, the anomalous correlation coefficients and the SigAno are significantly higher. Does this translate into better structure solution? It does:
Comparison of the last table with that of the previous paragraph, i.e. without zero-dose extrapolation, shows that the I/sigma, the anomalous correlation coefficients and the SigAno are significantly higher. Does this translate into better structure solution? It does:
Line 588: Line 589:
[[File:1y13-raddam-contrast-raddam.png]]
[[File:1y13-raddam-contrast-raddam.png]]


== Automatically building almost 3/4 of the main chain ==
=== Automatically building the main chain of 452 out of 519 residues ===


Based on the sites obtained by SHELXD, we run
Based on the sites obtained by SHELXD, we run
  shelxe.beta -a -q -h -b -s0.585 -m40 raddam raddam_fa
  shelxe.beta -a -q -h -b -s0.585 -m40 raddam raddam_fa
This already builds a significant number of residues, but also gives an improved list of heavy atom sites - there are actually 6 sites instead of the 5 that SHELXD wrote out (yes, we had asked SHELXD for 3 sites since there are 3 Met residues, but SHELXD as always was smarter than we are). We "mv raddam.hat raddam_fa.res" for another run of SHELXE:
This already builds a significant number of residues, but also gives an improved list of heavy atom sites - there are actually 6 sites instead of the 5 that SHELXD wrote out (yes, we had asked SHELXD for 3 sites since there are 3 Met residues, but SHELXD as always was smarter than we are). We "mv raddam.hat raddam_fa.res" for another run of SHELXE:
  shelxe.beta -a -q -h6 -b -s0.585 -m40 raddam raddam_fa
  shelxe.beta -a -q -h6 -b -s0.585 -m40 -n3 raddam raddam_fa
and get
and get
<pre>
<pre>
   374 residues left after pruning, divided into chains as follows:
   452 residues left after pruning, divided into chains as follows:
  A:  43   B: 37   C:  17   D:  19   E:   7   F:  16   G: 16   H:  5  I13
  A:  15   B:   5   C:  22   D:  22   E: 27   F:  62   G: 263   H:  36
J:  5  K:  9  L:  40  M:  81  N:  12  O:  15  P:  7  Q:  5  R:  8
S:  13  T:  6


  CC for partial structure against native data =  34.56 %
  CC for partial structure against native data =  39.83 %


  ------------------------------------------------------------------------------
  ------------------------------------------------------------------------------
Line 607: Line 606:
  Global autotracing cycle  4
  Global autotracing cycle  4


  <wt> = 0.300, Contrast = 0.484, Connect. = 0.705 for dens.mod. cycle 1
  <wt> = 0.300, Contrast = 0.447, Connect. = 0.705 for dens.mod. cycle 1
  <wt> = 0.300, Contrast = 0.704, Connect. = 0.780 for dens.mod. cycle 2
  <wt> = 0.300, Contrast = 0.660, Connect. = 0.781 for dens.mod. cycle 2
  <wt> = 0.300, Contrast = 0.761, Connect. = 0.799 for dens.mod. cycle 3
  <wt> = 0.300, Contrast = 0.723, Connect. = 0.801 for dens.mod. cycle 3
  <wt> = 0.300, Contrast = 0.795, Connect. = 0.805 for dens.mod. cycle 4
  <wt> = 0.300, Contrast = 0.762, Connect. = 0.807 for dens.mod. cycle 4
  Pseudo-free CC = 65.99 %
  Pseudo-free CC = 64.88 %
  <wt> = 0.300, Contrast = 0.817, Connect. = 0.810 for dens.mod. cycle 5
  <wt> = 0.300, Contrast = 0.785, Connect. = 0.810 for dens.mod. cycle 5
  <wt> = 0.300, Contrast = 0.834, Connect. = 0.813 for dens.mod. cycle 6
  <wt> = 0.300, Contrast = 0.806, Connect. = 0.813 for dens.mod. cycle 6
  <wt> = 0.300, Contrast = 0.844, Connect. = 0.816 for dens.mod. cycle 7
  <wt> = 0.300, Contrast = 0.820, Connect. = 0.815 for dens.mod. cycle 7
  <wt> = 0.300, Contrast = 0.852, Connect. = 0.818 for dens.mod. cycle 8
  <wt> = 0.300, Contrast = 0.831, Connect. = 0.817 for dens.mod. cycle 8
  <wt> = 0.300, Contrast = 0.856, Connect. = 0.820 for dens.mod. cycle 9
  <wt> = 0.300, Contrast = 0.839, Connect. = 0.819 for dens.mod. cycle 9
  Pseudo-free CC = 70.07 %
  Pseudo-free CC = 69.74 %
  <wt> = 0.300, Contrast = 0.859, Connect. = 0.821 for dens.mod. cycle 10
  <wt> = 0.300, Contrast = 0.845, Connect. = 0.820 for dens.mod. cycle 10
  <wt> = 0.300, Contrast = 0.860, Connect. = 0.822 for dens.mod. cycle 11
  <wt> = 0.300, Contrast = 0.849, Connect. = 0.821 for dens.mod. cycle 11
  <wt> = 0.300, Contrast = 0.861, Connect. = 0.822 for dens.mod. cycle 12
  <wt> = 0.300, Contrast = 0.851, Connect. = 0.822 for dens.mod. cycle 12
  <wt> = 0.300, Contrast = 0.861, Connect. = 0.823 for dens.mod. cycle 13
  <wt> = 0.300, Contrast = 0.853, Connect. = 0.823 for dens.mod. cycle 13
  <wt> = 0.300, Contrast = 0.861, Connect. = 0.823 for dens.mod. cycle 14
  <wt> = 0.300, Contrast = 0.854, Connect. = 0.823 for dens.mod. cycle 14
  Pseudo-free CC = 70.84 %
  Pseudo-free CC = 70.80 %
  <wt> = 0.300, Contrast = 0.860, Connect. = 0.824 for dens.mod. cycle 15
  <wt> = 0.300, Contrast = 0.854, Connect. = 0.824 for dens.mod. cycle 15
  <wt> = 0.300, Contrast = 0.860, Connect. = 0.824 for dens.mod. cycle 16
  <wt> = 0.300, Contrast = 0.855, Connect. = 0.824 for dens.mod. cycle 16
  <wt> = 0.300, Contrast = 0.859, Connect. = 0.824 for dens.mod. cycle 17
  <wt> = 0.300, Contrast = 0.855, Connect. = 0.824 for dens.mod. cycle 17
  <wt> = 0.300, Contrast = 0.858, Connect. = 0.824 for dens.mod. cycle 18
  <wt> = 0.300, Contrast = 0.854, Connect. = 0.824 for dens.mod. cycle 18
  <wt> = 0.300, Contrast = 0.857, Connect. = 0.824 for dens.mod. cycle 19
  <wt> = 0.300, Contrast = 0.854, Connect. = 0.824 for dens.mod. cycle 19
  Pseudo-free CC = 70.93 %
  Pseudo-free CC = 71.03 %
  <wt> = 0.300, Contrast = 0.856, Connect. = 0.824 for dens.mod. cycle 20
  <wt> = 0.300, Contrast = 0.854, Connect. = 0.824 for dens.mod. cycle 20
<wt> = 0.300, Contrast = 0.856, Connect. = 0.824 for dens.mod. cycle 21
<wt> = 0.300, Contrast = 0.855, Connect. = 0.824 for dens.mod. cycle 22
<wt> = 0.300, Contrast = 0.854, Connect. = 0.825 for dens.mod. cycle 23
<wt> = 0.300, Contrast = 0.853, Connect. = 0.824 for dens.mod. cycle 24
Pseudo-free CC = 70.85 %
<wt> = 0.300, Contrast = 0.853, Connect. = 0.824 for dens.mod. cycle 25
<wt> = 0.300, Contrast = 0.852, Connect. = 0.824 for dens.mod. cycle 26
<wt> = 0.300, Contrast = 0.851, Connect. = 0.825 for dens.mod. cycle 27
<wt> = 0.300, Contrast = 0.850, Connect. = 0.824 for dens.mod. cycle 28
<wt> = 0.300, Contrast = 0.850, Connect. = 0.824 for dens.mod. cycle 29
Pseudo-free CC = 70.69 %
<wt> = 0.300, Contrast = 0.849, Connect. = 0.824 for dens.mod. cycle 30
<wt> = 0.300, Contrast = 0.849, Connect. = 0.825 for dens.mod. cycle 31
<wt> = 0.300, Contrast = 0.848, Connect. = 0.824 for dens.mod. cycle 32
<wt> = 0.300, Contrast = 0.848, Connect. = 0.824 for dens.mod. cycle 33
<wt> = 0.300, Contrast = 0.847, Connect. = 0.824 for dens.mod. cycle 34
Pseudo-free CC = 70.51 %
<wt> = 0.300, Contrast = 0.847, Connect. = 0.824 for dens.mod. cycle 35
<wt> = 0.300, Contrast = 0.846, Connect. = 0.824 for dens.mod. cycle 36
<wt> = 0.300, Contrast = 0.846, Connect. = 0.824 for dens.mod. cycle 37
<wt> = 0.300, Contrast = 0.845, Connect. = 0.824 for dens.mod. cycle 38
<wt> = 0.300, Contrast = 0.845, Connect. = 0.824 for dens.mod. cycle 39
Pseudo-free CC = 70.35 %
<wt> = 0.300, Contrast = 0.844, Connect. = 0.824 for dens.mod. cycle 40


  Estimated mean FOM and mapCC as a function of resolution
  Estimated mean FOM and mapCC as a function of resolution
  d    inf - 4.62 - 3.64 - 3.17 - 2.88 - 2.67 - 2.51 - 2.38 - 2.27 - 2.18 - 2.11
  d    inf - 4.62 - 3.64 - 3.17 - 2.88 - 2.67 - 2.51 - 2.38 - 2.27 - 2.18 - 2.11
  <FOM>  0.712 0.774 0.761 0.710 0.694 0.682 0.622 0.598 0.582 0.534
  <FOM>  0.736 0.786 0.768 0.721 0.701 0.681 0.618 0.595 0.587 0.540
  <mapCC> 0.841 0.923 0.941 0.927 0.920 0.926 0.926 0.915 0.877 0.853
  <mapCC> 0.862 0.932 0.946 0.934 0.924 0.924 0.922 0.913 0.882 0.858
  N        4206  4227  4214  4135  4185  4207  4292  4406  4320  3702
  N        4206  4227  4214  4135  4185  4207  4292  4406  4320  3702


  Estimated mean FOM = 0.668   Pseudo-free CC = 70.35 %
  Estimated mean FOM = 0.674   Pseudo-free CC = 71.18 %


  Density (in map sigma units) at input heavy atom sites
  Density (in map sigma units) at input heavy atom sites


   Site    x        y        z    occ*Z    density
   Site    x        y        z    occ*Z    density
     1  0.2273   0.7578  0.1190 34.0000    30.83
     1  0.2276   0.7578  0.1189 34.0000    29.98
     2  0.1770   0.5343   0.2167 29.6922   29.13
     2  0.1568   0.6345   0.3049 32.2898   30.44
     3  0.1568   0.6341   0.3057 27.3088   29.14
     3  0.1767   0.5344   0.2160 32.2388   29.67
     4  0.3059  0.4523   0.1308 25.2552   23.34
     4  0.3059  0.4535   0.1297 26.0746   23.51
     5  0.0287   0.8253   0.1396 20.5870   20.24
     5  0.0280   0.8243   0.1410 22.7324   21.02
     6  0.0389   0.9744   0.0501 14.1270   19.44
     6  0.0383   0.9748   0.0492 21.5050   21.18


  Site    x      y      z  h(sig) near old  near new
  Site    x      y      z  h(sig) near old  near new
   1  0.2276 0.7578 0.1189 30.8 1/0.04 8/17.51 10/19.34 2/19.52 5/21.94
   1  0.1569 0.6345 0.3048 30.4 2/0.02 9/13.36 3/15.73 2/19.52 7/22.13
   2  0.1568 0.6345 0.3049 29.3 3/0.11 8/3.01 3/15.74 1/19.52 7/22.22
   2  0.2278 0.7578 0.1188 30.0 1/0.02 1/19.52 6/21.97 7/22.48 9/25.02
   3  0.1767  0.5344 0.2160 29.3 2/0.10 2/15.74 8/16.49 4/19.41 1/26.90
   3  0.1767  0.5345 0.2158 29.3/0.03 9/2.90 1/15.73 4/19.45 2/26.88
   4  0.3059 0.4535 0.1297 23.7 4/0.19 3/19.41 9/26.54 6/26.81 6/28.31
   4  0.3060 0.4536 0.1292 23.5 4/0.07 3/19.45 9/21.16 8/26.49 5/26.83
   5  0.0280 0.8243 0.1410 20.6 5/0.23 6/19.72 9/21.55 7/21.87 1/21.94
   5  0.0382 0.9748 0.0490 21.6/0.02 8/2.63 8/15.66 5/15.88 6/19.80
   6  0.0383 0.9748 0.0492 19.5 6/0.15 9/2.58 9/15.62 6/15.91 5/19.72
   6  0.0278 0.8240 0.1416 21.5/0.08 5/19.80 8/21.59 7/21.87 2/21.97
   7  0.1847 0.9579 0.1793 -5.2 5/21.79 5/21.87 2/22.22 1/22.59 9/22.61
   7  0.1854 0.9571 0.1787 -5.0 5/21.86 6/21.87 1/22.13 2/22.48 8/22.57
   8  0.1842 0.6442 0.3069 -4.9 3/3.02 2/3.01 3/16.49 1/17.51 10/22.77
   8  0.0427 0.9993 0.0530 -5.0 6/2.62 5/2.63 8/15.31 5/15.66 6/21.59
   9  0.0430 0.9990 0.0523 -4.8 6/2.59 6/2.58 9/15.17 6/15.62 5/21.55
   9  0.1787 0.5611 0.2228 -4.7 3/2.91 3/2.90 1/13.36 4/21.16 2/25.02
  10  0.3437  0.8355  0.0209  4.6  1/19.38  1/19.34 8/22.77 2/23.02 5/27.73
 
</pre>
</pre>


At this point the structure is obviously solved, and we could use buccanneer or Arp/wArp to add side chains and the rest of the model. 3-fold NCS surely helps!
At this point the structure is obviously solved, and we could use buccaneer or Arp/wArp to add side chains and the rest of the model. 3-fold NCS surely helps!
 


== Could we do better? ==
=== Could we do better? ===
   
   
Yes, of course (as always). I can think of three things to try:
Yes, of course (as always). I can think of four things to try:
* an [[optimization]] round of running xds for the two datasets
* an [[optimization]] round of running xds for the two datasets
* using a negative offset for STARTING_DOSE in XSCALE.INP, as documented in the [[XSCALE]] wiki article.
* using a negative offset for STARTING_DOSE in XSCALE.INP, as documented in the [[XSCALE]] wiki article.
* adding the "secondparts" data assuming this is a longer wavelength
* use MERGE=TRUE in XDSCONV.INP. I tried it and this gives 20 solutions with CCall+CCweak > 25 out of 1000 trials, whereas MERGE=FALSE (the default) gives only 4 solutions! Update Sep 2011: the [[ccp4com:SHELX_C/D/E#Obtaining_the_SHELX_programs|beta-test version]] of SHELXC should have a fix for this.
 
== better phases from DAD (Double Anomalous Dispersion) ==
 
The reason why pseudo-SAD is described here first is that, historically, I did it first since I thought that the wavelength could not realistically be changed within 3 seconds, and I therefore thought that the headers were wrong and this would not actually be a two-wavelength experiment. Along these lines, I interpreted the correlation coefficient of 1.0 between the E1 and E2 first parts as indicating that no isomorphous difference exists.
In a discussion with Gerard Bricogne and Clemens Vonrhein after the ACA2011 workshop it turned out that my theory, which claims that E1 and E2 are actually the same wavelength, is wrong. This was investigated by looking at the difference map (obtained using phenix.fobs_minus_fobs_map) of E1 and E2 (taking the first parts in each case) phased with the 1y13 model, which shows three strong (14-19 sigma) peaks. The fact that the 1-370 pieces merge so well seems to be a consequence of the fact that the anomalous signal of the two wavelengths is so similar, and the dispersive difference between the wavelengths does not significantly decrease the high correlation coefficient in data scaling.


But this time we learn that one has to take special care of the data in particular when they were measured by someone else who does not tell us everything we need to know. Second, zero-dose extrapolation made the day.
Thus even better phasing would be obtained by keeping the wavelengths separate and doing MAD (in fact DAD) - but zero-dose extrapolation could and should be done in the same way. I've therefore continued the analysis in [[1Y13-DAD]].