2,669
edits
(Created page with "<pre> SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA...") |
m (link to xscale and xscale_isocluster) |
||
(18 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This article deals with how to process serial synchrotron crystallography (SSX) data. | |||
The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be [http://bl831.als.lbl.gov/example_data_sets/tarballs downloaded] and the data and problem are described on his [http://bl831.als.lbl.gov/~jamesh/challenge/microfocus microfocus challenge page], and in a [http://journals.iucr.org/d/issues/2019/02/00/ba5297/index.html paper]. | |||
The challenges are | |||
# partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage | |||
# the crystals decay to about 1/2 within these 3 frames | |||
# the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set. | |||
The solution is to use [[XSCALE]] for scaling, and [[xscale_isocluster]] for analysing the scaled data. | |||
== Round 1: processing the data, and determining the space group == | |||
In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far. | |||
Using the following as the processing script integrate.rc: | |||
<pre> | |||
#!/bin/bash -f | |||
for f in `seq 1 100`; | |||
do | |||
export OUT=wedge0`printf "%03d" $f` | |||
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img" | |||
rm -rf $OUT | |||
mkdir $OUT | |||
cd $OUT | |||
generate_XDS.INP $NAMES | |||
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP | |||
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP | |||
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP | |||
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP | |||
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP | |||
/usr/local/bin/xds_par | |||
cd .. | |||
done | |||
mkdir xscale | |||
cd xscale | |||
cat >XSCALE.INP <<eof | |||
SPACE_GROUP_NUMBER= 1 | |||
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 | |||
OUTPUT_FILE=temp.ahkl | |||
SAVE_CORRECTION_IMAGES=FALSE | |||
FRIEDEL'S_LAW=TRUE | |||
eof | |||
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP | |||
</pre> | |||
we obtain in P1 | |||
<pre> | <pre> | ||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | ||
Line 26: | Line 69: | ||
total 256616 81419 86791 93.8% 54.3% 51.3% 243147 1.43 64.0% 64.6* 0 0.788 17331 | total 256616 81419 86791 93.8% 54.3% 51.3% 243147 1.43 64.0% 64.6* 0 0.788 17331 | ||
</pre> | </pre> | ||
and feed this to pointless: | |||
pointless xdsin temp.ahkl | pointless xdsin temp.ahkl | ||
which tells us | |||
<pre> | <pre> | ||
Scores for each symmetry element | Scores for each symmetry element | ||
Line 76: | Line 118: | ||
</pre> | </pre> | ||
Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to | |||
<pre> | |||
SPACE_GROUP_NUMBER= 94 | |||
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90 | |||
OUTPUT_FILE=temp.ahkl | |||
SAVE_CORRECTION_IMAGES=FALSE | |||
FRIEDEL'S_LAW=TRUE | |||
REIDX=0 1 0 0 0 0 1 0 1 0 0 0 | |||
</pre> | |||
where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with a,b,c of 38.3,79.1,79.1) , and obtain | |||
<pre> | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.03 2978 167 167 100.0% 53.6% 45.8% 2978 5.94 55.1% 99.2* 22 1.190 76 | |||
5.68 5488 274 274 100.0% 54.0% 46.1% 5488 6.12 55.4% 97.0* 20 0.915 175 | |||
4.64 6976 338 338 100.0% 55.4% 46.1% 6976 6.25 57.0% 99.1* 15 0.983 237 | |||
4.01 8069 390 390 100.0% 57.5% 46.3% 8069 6.01 59.0% 93.7* 8 0.991 294 | |||
3.59 9191 440 440 100.0% 63.9% 46.7% 9191 5.80 65.5% 89.2* 3 1.071 338 | |||
3.28 10239 474 474 100.0% 63.8% 47.0% 10239 5.85 65.4% 89.4* 4 1.119 375 | |||
3.03 11037 511 511 100.0% 66.0% 47.5% 11037 5.33 67.6% 91.7* 3 1.068 412 | |||
2.84 12014 547 547 100.0% 69.6% 49.1% 12014 4.80 71.2% 82.2* -1 1.092 447 | |||
2.68 12698 580 580 100.0% 72.2% 51.0% 12698 4.34 73.9% 83.8* -7 0.969 478 | |||
2.54 13360 612 612 100.0% 73.5% 54.1% 13360 3.98 75.3% 73.4* 4 1.025 511 | |||
2.42 14299 642 642 100.0% 76.8% 58.2% 14299 3.59 78.6% 57.0* 6 1.016 545 | |||
2.32 14827 667 667 100.0% 77.8% 62.3% 14827 3.38 79.6% 70.3* 1 0.924 563 | |||
2.23 15588 698 698 100.0% 79.5% 64.6% 15588 3.22 81.3% 64.9* -1 0.914 597 | |||
2.15 15888 705 705 100.0% 79.3% 68.0% 15888 3.23 81.1% 52.5* -5 0.882 614 | |||
2.07 16867 754 754 100.0% 82.7% 74.7% 16867 2.92 84.6% 50.1* 3 0.920 647 | |||
2.01 16847 754 754 100.0% 86.1% 77.3% 16847 2.73 88.1% 47.6* -3 0.839 658 | |||
1.95 17842 799 799 100.0% 90.4% 86.7% 17842 2.47 92.4% 49.3* 1 0.822 696 | |||
1.89 18095 810 811 99.9% 96.8% 101.2% 18095 2.21 99.1% 44.6* -4 0.773 707 | |||
1.84 18633 829 829 100.0% 106.4% 126.3% 18633 1.90 108.9% 39.6* -6 0.730 736 | |||
1.80 15510 824 863 95.5% 118.1% 151.4% 15500 1.46 121.2% 32.3* 2 0.688 699 | |||
total 256446 11815 11855 99.7% 64.9% 51.6% 256436 3.61 66.5% 97.9* 1 0.910 9805 | |||
</pre> | |||
Analysis with | |||
xscale_isocluster -dim 2 -clu 2 temp.ahkl | |||
yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud: | |||
[[File:1g1c-94.png]] | |||
(If the space group were correct, the result of [[xscale_isocluster]] should look similar to this: | |||
[[File:Lyso-xscale-isocluster.png]] | |||
which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.) | |||
We must now investigate whether the data have lower than tetragonal symmetry. | |||
XSCALEing with | |||
SPACE_GROUP_NUMBER=16 | |||
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90 | |||
gives a new temp.ahkl, with orthorhombic symmetry. | |||
xscale_isocluster -dim 2 -clu 2 temp.ahkl | |||
gives | |||
<pre> | |||
psi= 0.1692468 nhalo= 0 | |||
cluster: 1 center: 2 elements: 51 core: 51 halo: 0 | |||
cluster: 2 center: 6 elements: 49 core: 49 halo: 0 | |||
</pre> | |||
and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files). | |||
coot iso.pdb | |||
shows | |||
[[File:Coot.png]] | |||
and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group. | |||
Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get | |||
<pre> | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.03 1493 297 306 97.1% 11.8% 23.7% 1467 6.04 13.0% 98.2* 52* 0.662 123 | |||
5.68 2829 514 521 98.7% 18.9% 24.2% 2796 5.98 20.9% 96.1* 26* 0.778 258 | |||
4.64 3576 638 646 98.8% 23.3% 24.2% 3554 6.07 25.7% 93.3* 12 0.829 346 | |||
4.01 4140 748 756 98.9% 28.2% 24.5% 4105 5.84 31.0% 89.4* -5 0.818 418 | |||
3.59 4735 838 852 98.4% 30.9% 25.0% 4709 5.72 33.9% 86.7* 5 0.983 470 | |||
3.28 5268 912 921 99.0% 34.7% 25.8% 5228 5.52 38.0% 85.9* 0 1.005 533 | |||
3.03 5664 982 994 98.8% 37.8% 27.4% 5634 4.90 41.4% 82.1* 4 1.031 563 | |||
2.84 6114 1065 1068 99.7% 40.4% 31.7% 6082 4.13 44.4% 82.5* 5 0.963 613 | |||
2.68 6486 1127 1133 99.5% 44.5% 37.2% 6450 3.54 48.9% 74.8* 1 0.824 644 | |||
2.54 6819 1188 1197 99.2% 48.2% 44.6% 6784 3.01 53.0% 70.4* 1 0.816 709 | |||
2.42 7278 1249 1259 99.2% 51.9% 54.7% 7249 2.56 56.9% 70.6* 4 0.751 756 | |||
2.32 7595 1297 1304 99.5% 55.9% 63.4% 7555 2.26 61.5% 58.5* 4 0.729 809 | |||
2.23 7943 1361 1371 99.3% 57.8% 66.4% 7903 2.16 63.3% 63.5* -3 0.687 844 | |||
2.15 8093 1375 1385 99.3% 60.1% 75.4% 8054 2.03 65.9% 66.7* 3 0.664 860 | |||
2.07 8561 1476 1482 99.6% 64.8% 88.3% 8512 1.76 71.1% 53.0* 7 0.640 914 | |||
2.01 8613 1473 1482 99.4% 68.3% 95.8% 8570 1.60 74.9% 60.6* -1 0.628 928 | |||
1.95 9048 1566 1571 99.7% 73.1% 112.2% 9004 1.41 80.2% 56.7* -3 0.571 966 | |||
1.89 9236 1580 1593 99.2% 82.6% 142.1% 9204 1.19 90.8% 56.3* -5 0.504 1000 | |||
1.84 9467 1618 1631 99.2% 92.8% 180.0% 9432 0.96 101.9% 43.2* 4 0.467 1007 | |||
1.80 7927 1570 1701 92.3% 104.8% 225.2% 7811 0.70 116.1% 42.6* -5 0.425 785 | |||
total 130885 22874 23173 98.7% 38.3% 41.0% 130103 2.77 42.1% 92.0* 3 0.703 13546 | |||
</pre> | |||
At this point, we run | |||
xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6 | |||
and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP. | |||
After <code>xscale_isocluster -dim 2 -clu 1</code> , | |||
coot iso.pdb | |||
now reveals a single cloud: | |||
[[File:1g1c-19.png]] | |||
We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS. | |||
pointless xdsin XSCALE.1.HKL | |||
gives | |||
<pre> | |||
Spacegroup TotProb SysAbsProb Reindex Conditions | |||
P 21 21 21 ( 19) 0.896 0.924 h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3) | |||
.......... | |||
P 2 21 21 ( 18) 0.044 0.045 0k0: k=2n, 00l: l=2n (zones 2,3) | |||
.......... | |||
P 21 21 2 ( 18) 0.015 0.015 h00: h=2n, 0k0: k=2n (zones 1,2) | |||
.......... | |||
P 21 2 21 ( 18) 0.014 0.014 h00: h=2n, 00l: l=2n (zones 1,3) | |||
--------------------------------------------------------------- | |||
Space group confidence (= Sqrt(Score * (Score - NextBestScore))) = 0.87 | |||
Laue group confidence (= Sqrt(Score * (Score - NextBestScore))) = 0.97 | |||
Selecting space group P 21 21 21 as there is a single space group with the highest score | |||
<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$ | |||
Best Solution: space group P 21 21 21 | |||
Reindex operator: [h,k,l] | |||
Laue group probability: 0.970 | |||
Systematic absence probability: 0.924 | |||
Total probability: 0.896 | |||
Space group confidence: 0.874 | |||
Laue group confidence 0.966 | |||
Unit cell: 38.30 79.10 79.10 90.00 90.00 90.00 | |||
79.10 to 2.47 - Resolution range used for Laue group search | |||
79.10 to 1.80 - Resolution range in file, used for systematic absence check | |||
</pre> | |||
thus we now know the spacegroup. | |||
== Round 2: using the REFERENCE_DATA_SET obtained from one cluster== | |||
The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE: | |||
<pre> | |||
#!/bin/bash -f | |||
for f in `seq 1 100`; | |||
do | |||
export OUT=wedge0`printf "%03d" $f` | |||
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img" | |||
rm -rf $OUT | |||
mkdir $OUT | |||
cd $OUT | |||
generate_XDS.INP $NAMES | |||
echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP | |||
echo MINIMUM_I/SIGMA=50 >>XDS.INP | |||
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP | |||
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP | |||
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP | |||
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP | |||
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP | |||
/usr/local/bin/xds_par | |||
cd .. | |||
done | |||
mkdir xscale | |||
cd xscale | |||
cat >XSCALE.INP <<eof | |||
SPACE_GROUP_NUMBER= 19 | |||
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 | |||
OUTPUT_FILE=temp.ahkl | |||
SAVE_CORRECTION_IMAGES=FALSE | |||
eof | |||
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP | |||
</pre> | |||
and we get as XSCALE.LP : | |||
<pre> | |||
NOTE: Friedel pairs are treated as different reflections. | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.04 2960 473 476 99.4% 6.2% 5.5% 2955 29.90 6.7% 99.8* 86* 2.824 166 | |||
5.68 5486 890 894 99.6% 4.9% 5.9% 5478 27.38 5.3% 99.7* 86* 2.384 363 | |||
4.64 6934 1136 1138 99.8% 4.9% 5.8% 6918 27.64 5.4% 99.8* 76* 1.829 480 | |||
4.02 8066 1363 1367 99.7% 5.3% 5.9% 8045 26.67 5.9% 99.6* 57* 1.426 590 | |||
3.59 9121 1535 1539 99.7% 6.1% 6.3% 9092 25.58 6.7% 99.6* 50* 1.298 666 | |||
3.28 10222 1690 1694 99.8% 6.8% 6.8% 10203 24.69 7.5% 99.4* 36* 1.204 751 | |||
3.04 10990 1831 1834 99.8% 8.5% 8.0% 10970 21.40 9.3% 99.3* 22* 1.086 827 | |||
2.84 12065 1993 1999 99.7% 11.2% 11.1% 12038 17.68 12.2% 99.0* 24* 1.085 894 | |||
2.68 12771 2120 2124 99.8% 14.7% 15.1% 12738 14.78 16.1% 98.4* 14* 0.960 952 | |||
2.54 13054 2196 2198 99.9% 18.9% 20.2% 13026 12.53 20.8% 97.7* 13* 0.867 995 | |||
2.42 14290 2372 2375 99.9% 24.9% 27.1% 14261 10.34 27.3% 96.1* 6 0.813 1083 | |||
2.32 14704 2432 2438 99.8% 29.8% 32.5% 14676 9.21 32.6% 95.1* 8 0.843 1115 | |||
2.23 15623 2582 2593 99.6% 33.0% 35.0% 15587 8.83 36.1% 93.0* 6 0.831 1180 | |||
2.15 15732 2610 2613 99.9% 37.1% 39.2% 15697 8.10 40.6% 91.0* 8 0.818 1203 | |||
2.08 16782 2788 2795 99.7% 44.1% 47.0% 16741 7.01 48.3% 88.3* 4 0.797 1276 | |||
2.01 16783 2802 2809 99.8% 46.8% 48.7% 16747 6.54 51.2% 89.5* 3 0.807 1293 | |||
1.95 18262 3043 3051 99.7% 56.5% 58.0% 18221 5.61 61.9% 85.9* 0 0.803 1402 | |||
1.89 17810 2979 2988 99.7% 68.3% 69.8% 17769 4.63 74.8% 80.0* 7 0.864 1374 | |||
1.84 18503 3112 3117 99.8% 87.5% 90.3% 18454 3.55 96.0% 69.6* 3 0.838 1435 | |||
1.80 16130 2988 3185 93.8% 101.2% 110.5% 15959 2.77 111.7% 62.9* 2 0.798 1276 | |||
total 256288 42935 43227 99.3% 13.4% 14.0% 255575 11.63 14.6% 99.6* 21* 0.975 19321 | |||
</pre> | |||
The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with [[ccp4com:SHELX C/D/E|hkl2map]]: | |||
== Result == | |||
=== SHELXC: anomalous CC<sub>1/2</sub> === | |||
[[File:Cc12ano.png]] | |||
=== SHELXD: CCall ''versus'' CCweak, and histogram === | |||
[[File:Ccallcsccweak.png]] | |||
[[File:Histcfom.png]] | |||
=== SHELXE: contrast versus cycle, and PDB with structure === | |||
[[File:Contrastvscycle.png]] | |||
[[File:Ribbon.png]] | |||
Further optimization of processing may be possible, but is left as an exercise to the reader. |