116
edits
No edit summary |
mNo edit summary |
||
(15 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
This is an example of S-SAD structure solution (PDB id [http://www.rcsb.org/pdb/explore.do?structureId=2QVO 2QVO]), a 95-residue protein used by James Tucker Swindell II to establish optimized procedures for data reduction. The data available to solve the structure are two runs of 360° collected at a wavelength of 1.9Å. | |||
==XDS data reduction== | ==XDS data reduction== | ||
In the course of writing this up, it turned out that it was not necessary to scale the two datasets together, using [[XSCALE]], because the structure can be solved from any of the two, separately. But, of course, structure solution would be easier when merging the data (try for yourself!). | |||
===dataset 1=== | ===dataset 1=== | ||
Using | Using [[generate_XDS.INP]] "../../APS/22-ID/2qvo/ACA10_AF1382_1.0???" we obtain: | ||
<pre> | <pre> | ||
JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT | JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT | ||
Line 69: | Line 73: | ||
* 21 tP 7.3 53.5 53.5 41.2 90.1 90.1 90.3 0 1 0 0 0 0 -1 0 -1 0 0 0 | * 21 tP 7.3 53.5 53.5 41.2 90.1 90.1 90.3 0 1 0 0 0 0 -1 0 -1 0 0 0 | ||
39 mC 249.8 114.5 41.2 53.5 90.1 90.3 69.0 1 -2 0 0 1 0 0 0 0 0 1 0 | 39 mC 249.8 114.5 41.2 53.5 90.1 90.3 69.0 1 -2 0 0 1 0 0 0 0 0 1 0 | ||
indicating at most tetragonal symmetry, | indicating at most tetragonal symmetry. Below this table, CORRECT calculates R-factors for each of the lattices whose metric symmetry is compatible with the cell of the crystal (marked by * in the table above): | ||
SPACE-GROUP UNIT CELL CONSTANTS UNIQUE Rmeas COMPARED LATTICE- | SPACE-GROUP UNIT CELL CONSTANTS UNIQUE Rmeas COMPARED LATTICE- | ||
NUMBER a b c alpha beta gamma CHARACTER | NUMBER a b c alpha beta gamma CHARACTER | ||
Line 136: | Line 140: | ||
NUMBER OF UNIQUE ACCEPTED REFLECTIONS 13784 | NUMBER OF UNIQUE ACCEPTED REFLECTIONS 13784 | ||
So the anomalous signal goes to about 3.3 | So the anomalous signal goes to about 3.3 Å (which is where 30% would be, in the "Anomal Corr" column), and the useful resolution goes to 2.16 Å, I'd say (pls note that this table treats Friedels separately; merging them increases I/sigma by another factor of 1.41). | ||
For the sake of comparability, from now on we use the same axes (53.03 53.03 40.97) as the deposited PDB id 2QVO. | For the sake of comparability, from now on we use the same axes (53.03 53.03 40.97) as the deposited PDB id 2QVO. | ||
Line 164: | Line 168: | ||
===dataset 2=== | ===dataset 2=== | ||
This works exactly the same way as dataset 1. The table in CORRECT.LP is | This works exactly the same way as dataset 1. The geometry refinement is surprisingly bad: | ||
REFINED PARAMETERS: DISTANCE BEAM ORIENTATION CELL AXIS | |||
USING 49218 INDEXED SPOTS | |||
STANDARD DEVIATION OF SPOT POSITION (PIXELS) 1.78 | |||
STANDARD DEVIATION OF SPINDLE POSITION (DEGREES) 0.15 | |||
CRYSTAL MOSAICITY (DEGREES) 0.218 | |||
DIRECT BEAM COORDINATES (REC. ANGSTROEM) 0.002198 -0.000174 0.526311 | |||
DETECTOR COORDINATES (PIXELS) OF DIRECT BEAM 1991.28 2027.42 | |||
DETECTOR ORIGIN (PIXELS) AT 1984.09 2027.99 | |||
CRYSTAL TO DETECTOR DISTANCE (mm) 126.03 | |||
LAB COORDINATES OF DETECTOR X-AXIS 1.000000 0.000000 0.000000 | |||
LAB COORDINATES OF DETECTOR Y-AXIS 0.000000 1.000000 0.000000 | |||
LAB COORDINATES OF ROTATION AXIS 0.999979 0.002580 -0.006016 | |||
COORDINATES OF UNIT CELL A-AXIS -31.728 -7.177 -42.595 | |||
COORDINATES OF UNIT CELL B-AXIS 40.575 13.173 -32.443 | |||
COORDINATES OF UNIT CELL C-AXIS 11.394 -39.576 -1.819 | |||
REC. CELL PARAMETERS 0.018658 0.018658 0.024258 90.000 90.000 90.000 | |||
UNIT CELL PARAMETERS 53.595 53.595 41.224 90.000 90.000 90.000 | |||
E.S.D. OF CELL PARAMETERS 1.0E-02 1.0E-02 1.7E-02 0.0E+00 0.0E+00 0.0E+00 | |||
SPACE GROUP NUMBER 75 | |||
with its large "STANDARD DEVIATION OF SPOT POSITION (PIXELS)" which may indicate a slipping crystal, or changing cell parameters due to radiation damage. However no indication of any of this is found in the repeated refinements listed in INTEGRATE.LP, so we do not know what to attribute this problem to! | |||
The main table in CORRECT.LP is | |||
NOTE: Friedel pairs are treated as different reflections. | NOTE: Friedel pairs are treated as different reflections. | ||
Line 190: | Line 216: | ||
NUMBER OF UNIQUE ACCEPTED REFLECTIONS 13738 | NUMBER OF UNIQUE ACCEPTED REFLECTIONS 13738 | ||
Dataset 2 is definitively better than dataset 1. | Dataset 2 is definitively better than dataset 1. Note that the number of misfits is more than 2.5% whereas one should expect about 1% (with WFAC1=1). | ||
==SHELXC/D/E structure solution== | ==SHELXC/D/E structure solution== | ||
This is done in a subdirectory of the XDS data reduction directory ( | This is done in a subdirectory of the XDS data reduction directory (of dataset "1" or "2"). Here, we use a script to generate XDSCONV.INP (I used MERGE=TRUE, sometimes the results are better that way; update Sep 2011: the [[ccp4com:SHELX_C/D/E#Obtaining_the_SHELX_programs|beta-test version of SHELXC]] fixes this problem, so MERGE=FALSE would be preferable since it gives more statistics output), run [[XDSCONV|xdsconv]] and [[ccp4com:SHELX_C/D/E|SHELXC]]. | ||
<pre> | <pre> | ||
#!/bin/csh -f | #!/bin/csh -f | ||
Line 240: | Line 266: | ||
shelxd j_fa | shelxd j_fa | ||
This gives best CC All/Weak of 37.28 / 21.38 for dataset 1, and best CC All/Weak of 37.89 / 23.80 for dataset 2 | The "FIND 3" needs a comment: the sequence has 4 Met and 1 Cys, but we don't expect to find the N-terminal Met. Since SHELXD always searches for more atoms than specified, we might as well tell it to try and locate 3 sulfurs. | ||
This gives best CC All/Weak of 37.28 / 21.38 for dataset 1, and best CC All/Weak of 37.89 / 23.80 for dataset 2. | |||
Next we run G. Sheldrick's beta-Version of [[ccp4com:SHELX_C/D/E|SHELXE]] Version 2011/1: | Next we run G. Sheldrick's beta-Version of [[ccp4com:SHELX_C/D/E|SHELXE]] Version 2011/1: | ||
shelxe.beta j j_fa -a -q -h -s0.55 -m20 -b | shelxe.beta j j_fa -a -q -h -s0.55 -m20 -b | ||
and | and the inverse hand: | ||
shelxe.beta j j_fa -a -q -h -s0.55 -m20 -b -i | shelxe.beta j j_fa -a -q -h -s0.55 -m20 -b -i | ||
Line 317: | Line 345: | ||
==Can we do better?== | ==Can we do better?== | ||
===data reduction=== | ===data reduction=== | ||
The safest way to optimize the data reduction is to look at external quality indicators. Internal R-factors, and even the correlation coefficient of the anomalous signal are of comparatively little value. A readily available external quality indicator is CC All/CC Weak as obtained by [[ccp4com:SHELX_C/D/E|SHELXD]]. | The safest way to optimize the data reduction is to look at external quality indicators. Internal R-factors, and even the correlation coefficient of the anomalous signal are of comparatively little value. A readily available external quality indicator is CC All/CC Weak as obtained by [[ccp4com:SHELX_C/D/E|SHELXD]], and the percentage of successful trials. | ||
I tried a number of possibilities: | |||
* [[Optimization]] by "re-cycling" GXPARM.XDS to XPARM.XDS and re-running INTEGRATE, coupled with REFINE(INTEGRATE)= ! (empty list) and specifying BEAM_DIVERGENCE_E.S.D. and similar parameters as obtained from INTEGRATE.LP: this quite often helps to improve geometry a bit but had no clear effect here. | |||
* STRICT_ABSORPTION_CORRECTION=TRUE - this is useful if the chi^2 -values of the three scaling steps in CORRECT.LP are 1.5 and higher which is not the case here. Consequently this also had no clear effect. | |||
* increasing MAXIMUM_ERROR_OF_SPOT_POSITION from its default of 3 to ( 3 * STANDARD DEVIATION OF SPOT POSITION (PIXELS)) which would mean increasing to 5 here: no clear effect. | |||
* increasing WFAC1 : this was suggested by the number of misfits which is clearly higher than the usual 1 % of observations. WFAC1=1.5 has indeed a very positive effect on SHELXD: for dataset 1, the best CC All/Weak becomes '''44.93 / 22.82''' (dataset 2: '''48.11 / 27.78'''), and the number of successful trials goes from about 60% to 91% (dataset 2: 94%).''' One should note that all internal quality indicators get worse when increasing WFAC1 - but the external ones got significant better!''' The number of misfits with WFAC1=1.5 dropped to 196 / 436 for datasets 1 and 2, respectively. | |||
* MERGE=FALSE vs MERGE=TRUE in XDSCONV.INP: after finding out about WFAC1 I tried MERGE=FALSE (the default !) and it turned out to be a bit better - best CC All/Weak '''48.66 / 28.05''' for dataset 2. On the other hand, the number of successful trials went down to 77% (from 94%). This result is somewhat difficult to interpret, but I like MERGE=TRUE better. | |||
We may thus conclude that in this case the rejection of misfits beyond the target value of 1% reduces data quality significantly. In (other) desperate cases, if no successful trials are made by SHELXD it may be worth to always try WFAC1=1.5 provided the number of misfits is high. | |||
We also learn that it's usually ''not'' going to help much to deviate from the defaults (MERGE=, MAXIMUM_ERROR_OF_SPOT_POSITION=, STRICT_ABSORPTION_CORRECTION=) unless there is a clear reason (high number of misfits) to! | |||
===structure solution=== | ===structure solution=== | ||
Line 329: | Line 362: | ||
The resolution limit for SHELXD could be varied. For SHELXE, the solvent content could be varied, and the number of autobuilding cycles, and probably also the high resolution cutoff. Furthermore, it would be advantageous to "re-cycle" the file j.hat to j_fa.res, since the heavy-atom sites from SHELXE are more accurate than those from SHELXD, as the phases derived from the poly-Ala traces are quite good (compare the density columns of the two consecutive heavy-atom lists!). | The resolution limit for SHELXD could be varied. For SHELXE, the solvent content could be varied, and the number of autobuilding cycles, and probably also the high resolution cutoff. Furthermore, it would be advantageous to "re-cycle" the file j.hat to j_fa.res, since the heavy-atom sites from SHELXE are more accurate than those from SHELXD, as the phases derived from the poly-Ala traces are quite good (compare the density columns of the two consecutive heavy-atom lists!). | ||
== | With the optimally-reduced dataset 2, I get from SHELXE: | ||
Density (in map sigma units) at input heavy atom sites | |||
Site x y z occ*Z density | |||
1 0.3361 0.9695 0.9827 16.0000 24.15 | |||
2 0.3708 1.1540 1.0380 14.5216 17.48 | |||
3 0.1576 1.2210 1.1222 9.2848 12.60 | |||
4 0.4807 1.1304 1.0314 7.2224 8.95 | |||
5 0.4539 1.1750 1.0368 6.6224 7.26 | |||
Site x y z h(sig) near old near new | |||
1 0.3380 0.9687 0.9828 24.3 1/0.11 6/2.40 2/10.33 4/11.42 4/11.81 | |||
2 0.3732 1.1546 1.0426 18.1 2/0.23 5/4.00 4/5.67 6/9.92 1/10.33 | |||
3 0.1637 1.2180 1.1226 13.5 3/0.36 2/12.06 5/15.47 6/15.97 1/17.12 | |||
4 0.4784 1.1371 1.0333 9.3 4/0.38 5/2.89 2/5.67 1/11.42 1/11.81 | |||
5 0.4439 1.1791 1.0300 9.0 5/0.64 4/2.89 2/4.00 6/12.54 1/12.64 | |||
6 0.3273 0.9734 1.0393 -5.9 1/2.38 1/2.40 2/9.92 4/11.82 4/11.86 | |||
so the density is better, but not much. Furthermore, we note in passing that the number of anomalous scatterers (5) matches the sum of 4 Met and 1 Cys in the sequence. | |||
==Exploring the limits== | |||
With dataset 2, I tried to use the first 270 frames and could indeed solve the structure using the above SHELXC/D/E approach (with WFAC1=1.5) - 85 residues in a single chain, with "CC for partial structure against native data = 47.51 %". It should be mentioned that I also tried this in November 2009, and it didn't work with the version of XDS available then! | |||
With 180 frames, it was possible to get a complete model by (twice) re-cycling the j.hat file to j_fa.res. '''This means that the structure can be automatically solved just from the first 180 frames of dataset 2!''' | |||
==Availability== | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-1-1_360-F.mtz] - amplitudes for frames 1-360 of dataset 1. | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-1-1_360-I.mtz] - intensities for frames 1-360 of dataset 1. | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-2-1_180-F.mtz] - amplitudes for frames 1-180 of dataset 2. | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-2-1_180-I.mtz] - intensities for frames 1-180 of dataset 2. | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-2-1_360-F.mtz] - amplitudes for frames 1-360 of dataset 2. | |||
* [https://{{SERVERNAME}}/pub/xds-datared/2qvo/xds-2qvo-2-1_360-I.mtz] - intensities for frames 1-360 of dataset 2. | |||
As you can see, all these files are in the same directory [https://{{SERVERNAME}}/pub/xds-datared/2qvo/]. I put there the XDS_ASCII.HKL files and SHELXD/SHELXE result files as well. |