2,669
edits
No edit summary |
m (link to xscale and xscale_isocluster) |
||
(17 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
This article deals with how to process serial synchrotron crystallography (SSX) data. | |||
The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be [http://bl831.als.lbl.gov/example_data_sets/tarballs downloaded] and the data and problem are described on his [http://bl831.als.lbl.gov/~jamesh/challenge/microfocus microfocus challenge page], and in a [http://journals.iucr.org/d/issues/2019/02/00/ba5297/index.html paper]. | |||
The challenges are | |||
# partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage | |||
# the crystals decay to about 1/2 within these 3 frames | |||
# the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set. | |||
The solution is to use [[XSCALE]] for scaling, and [[xscale_isocluster]] for analysing the scaled data. | |||
== Round 1: processing the data, and determining the space group == | |||
In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far. | |||
Using the following as the processing script integrate.rc: | |||
<pre> | |||
#!/bin/bash -f | |||
for f in `seq 1 100`; | |||
do | |||
export OUT=wedge0`printf "%03d" $f` | |||
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img" | |||
rm -rf $OUT | |||
mkdir $OUT | |||
cd $OUT | |||
generate_XDS.INP $NAMES | |||
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP | |||
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP | |||
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP | |||
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP | |||
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP | |||
/usr/local/bin/xds_par | |||
cd .. | |||
done | |||
mkdir xscale | |||
cd xscale | |||
cat >XSCALE.INP <<eof | |||
SPACE_GROUP_NUMBER= 1 | |||
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 | |||
OUTPUT_FILE=temp.ahkl | |||
SAVE_CORRECTION_IMAGES=FALSE | |||
FRIEDEL'S_LAW=TRUE | |||
eof | |||
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP | |||
</pre> | |||
we obtain in P1 | |||
<pre> | <pre> | ||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | ||
Line 26: | Line 69: | ||
total 256616 81419 86791 93.8% 54.3% 51.3% 243147 1.43 64.0% 64.6* 0 0.788 17331 | total 256616 81419 86791 93.8% 54.3% 51.3% 243147 1.43 64.0% 64.6* 0 0.788 17331 | ||
</pre> | </pre> | ||
and feed this to pointless: | |||
pointless xdsin temp.ahkl | pointless xdsin temp.ahkl | ||
which tells us | |||
<pre> | <pre> | ||
Scores for each symmetry element | Scores for each symmetry element | ||
Line 76: | Line 118: | ||
</pre> | </pre> | ||
Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to | |||
<pre> | |||
SPACE_GROUP_NUMBER= 94 | SPACE_GROUP_NUMBER= 94 | ||
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90 | UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90 | ||
Line 83: | Line 126: | ||
FRIEDEL'S_LAW=TRUE | FRIEDEL'S_LAW=TRUE | ||
REIDX=0 1 0 0 0 0 1 0 1 0 0 0 | REIDX=0 1 0 0 0 0 1 0 1 0 0 0 | ||
</pre> | |||
where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with a,b,c of 38.3,79.1,79.1) , and obtain | |||
<pre> | <pre> | ||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | ||
Line 111: | Line 155: | ||
total 256446 11815 11855 99.7% 64.9% 51.6% 256436 3.61 66.5% 97.9* 1 0.910 9805 | total 256446 11815 11855 99.7% 64.9% 51.6% 256436 3.61 66.5% 97.9* 1 0.910 9805 | ||
</pre> | </pre> | ||
Analysis with | |||
xscale_isocluster -dim 2 -clu 2 temp.ahkl | xscale_isocluster -dim 2 -clu 2 temp.ahkl | ||
yields a iso.pdb which is | yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud: | ||
[[File:1g1c-94.png]] | |||
(If the space group were correct, the result of [[xscale_isocluster]] should look similar to this: | |||
[[File:Lyso-xscale-isocluster.png]] | |||
which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.) | |||
We must now investigate whether the data have lower than tetragonal symmetry. | |||
XSCALEing with | XSCALEing with | ||
SPACE_GROUP_NUMBER= | SPACE_GROUP_NUMBER=16 | ||
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90 | UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90 | ||
gives a new temp.ahkl, with orthorhombic symmetry. | gives a new temp.ahkl, with orthorhombic symmetry. | ||
xscale_isocluster -dim 2 -clu 2 temp.ahkl | xscale_isocluster -dim 2 -clu 2 temp.ahkl | ||
gives | gives | ||
<pre> | |||
psi= 0.1692468 nhalo= 0 | psi= 0.1692468 nhalo= 0 | ||
cluster: 1 center: 2 elements: 51 core: 51 halo: 0 | cluster: 1 center: 2 elements: 51 core: 51 halo: 0 | ||
cluster: 2 center: 6 elements: 49 core: 49 halo: 0 | cluster: 2 center: 6 elements: 49 core: 49 halo: 0 | ||
and prepares XSCALE.1.INP (and XSCALE.2.INP for further use. | </pre> | ||
and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files). | |||
coot iso.pdb | coot iso.pdb | ||
shows | shows | ||
thus two well separated clouds. | [[File:Coot.png]] | ||
and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group. | |||
Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and | Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get | ||
<pre> | <pre> | ||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | ||
Line 155: | Line 214: | ||
1.80 7927 1570 1701 92.3% 104.8% 225.2% 7811 0.70 116.1% 42.6* -5 0.425 785 | 1.80 7927 1570 1701 92.3% 104.8% 225.2% 7811 0.70 116.1% 42.6* -5 0.425 785 | ||
total 130885 22874 23173 98.7% 38.3% 41.0% 130103 2.77 42.1% 92.0* 3 0.703 13546 | total 130885 22874 23173 98.7% 38.3% 41.0% 130103 2.77 42.1% 92.0* 3 0.703 13546 | ||
</pre> | |||
At this point, we run | |||
xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6 | |||
and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP. | |||
After <code>xscale_isocluster -dim 2 -clu 1</code> , | |||
coot iso.pdb | |||
now reveals a single cloud: | |||
[[File:1g1c-19.png]] | |||
We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS. | |||
pointless xdsin XSCALE.1.HKL | |||
gives | |||
<pre> | |||
Spacegroup TotProb SysAbsProb Reindex Conditions | |||
P 21 21 21 ( 19) 0.896 0.924 h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3) | |||
.......... | |||
P 2 21 21 ( 18) 0.044 0.045 0k0: k=2n, 00l: l=2n (zones 2,3) | |||
.......... | |||
P 21 21 2 ( 18) 0.015 0.015 h00: h=2n, 0k0: k=2n (zones 1,2) | |||
.......... | |||
P 21 2 21 ( 18) 0.014 0.014 h00: h=2n, 00l: l=2n (zones 1,3) | |||
--------------------------------------------------------------- | |||
Space group confidence (= Sqrt(Score * (Score - NextBestScore))) = 0.87 | |||
Laue group confidence (= Sqrt(Score * (Score - NextBestScore))) = 0.97 | |||
Selecting space group P 21 21 21 as there is a single space group with the highest score | |||
<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$ | |||
Best Solution: space group P 21 21 21 | |||
Reindex operator: [h,k,l] | |||
Laue group probability: 0.970 | |||
Systematic absence probability: 0.924 | |||
Total probability: 0.896 | |||
Space group confidence: 0.874 | |||
Laue group confidence 0.966 | |||
Unit cell: 38.30 79.10 79.10 90.00 90.00 90.00 | |||
79.10 to 2.47 - Resolution range used for Laue group search | |||
79.10 to 1.80 - Resolution range in file, used for systematic absence check | |||
</pre> | |||
thus we now know the spacegroup. | |||
== Round 2: using the REFERENCE_DATA_SET obtained from one cluster== | |||
The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE: | |||
<pre> | |||
#!/bin/bash -f | |||
for f in `seq 1 100`; | |||
do | |||
export OUT=wedge0`printf "%03d" $f` | |||
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img" | |||
rm -rf $OUT | |||
mkdir $OUT | |||
cd $OUT | |||
generate_XDS.INP $NAMES | |||
echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP | |||
echo MINIMUM_I/SIGMA=50 >>XDS.INP | |||
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP | |||
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP | |||
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP | |||
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP | |||
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP | |||
/usr/local/bin/xds_par | |||
cd .. | |||
done | |||
mkdir xscale | |||
cd xscale | |||
cat >XSCALE.INP <<eof | |||
SPACE_GROUP_NUMBER= 19 | |||
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 | |||
OUTPUT_FILE=temp.ahkl | |||
SAVE_CORRECTION_IMAGES=FALSE | |||
eof | |||
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP | |||
</pre> | |||
and we get as XSCALE.LP : | |||
<pre> | |||
NOTE: Friedel pairs are treated as different reflections. | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas CC(1/2) Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.04 2960 473 476 99.4% 6.2% 5.5% 2955 29.90 6.7% 99.8* 86* 2.824 166 | |||
5.68 5486 890 894 99.6% 4.9% 5.9% 5478 27.38 5.3% 99.7* 86* 2.384 363 | |||
4.64 6934 1136 1138 99.8% 4.9% 5.8% 6918 27.64 5.4% 99.8* 76* 1.829 480 | |||
4.02 8066 1363 1367 99.7% 5.3% 5.9% 8045 26.67 5.9% 99.6* 57* 1.426 590 | |||
3.59 9121 1535 1539 99.7% 6.1% 6.3% 9092 25.58 6.7% 99.6* 50* 1.298 666 | |||
3.28 10222 1690 1694 99.8% 6.8% 6.8% 10203 24.69 7.5% 99.4* 36* 1.204 751 | |||
3.04 10990 1831 1834 99.8% 8.5% 8.0% 10970 21.40 9.3% 99.3* 22* 1.086 827 | |||
2.84 12065 1993 1999 99.7% 11.2% 11.1% 12038 17.68 12.2% 99.0* 24* 1.085 894 | |||
2.68 12771 2120 2124 99.8% 14.7% 15.1% 12738 14.78 16.1% 98.4* 14* 0.960 952 | |||
2.54 13054 2196 2198 99.9% 18.9% 20.2% 13026 12.53 20.8% 97.7* 13* 0.867 995 | |||
2.42 14290 2372 2375 99.9% 24.9% 27.1% 14261 10.34 27.3% 96.1* 6 0.813 1083 | |||
2.32 14704 2432 2438 99.8% 29.8% 32.5% 14676 9.21 32.6% 95.1* 8 0.843 1115 | |||
2.23 15623 2582 2593 99.6% 33.0% 35.0% 15587 8.83 36.1% 93.0* 6 0.831 1180 | |||
2.15 15732 2610 2613 99.9% 37.1% 39.2% 15697 8.10 40.6% 91.0* 8 0.818 1203 | |||
2.08 16782 2788 2795 99.7% 44.1% 47.0% 16741 7.01 48.3% 88.3* 4 0.797 1276 | |||
2.01 16783 2802 2809 99.8% 46.8% 48.7% 16747 6.54 51.2% 89.5* 3 0.807 1293 | |||
1.95 18262 3043 3051 99.7% 56.5% 58.0% 18221 5.61 61.9% 85.9* 0 0.803 1402 | |||
1.89 17810 2979 2988 99.7% 68.3% 69.8% 17769 4.63 74.8% 80.0* 7 0.864 1374 | |||
1.84 18503 3112 3117 99.8% 87.5% 90.3% 18454 3.55 96.0% 69.6* 3 0.838 1435 | |||
1.80 16130 2988 3185 93.8% 101.2% 110.5% 15959 2.77 111.7% 62.9* 2 0.798 1276 | |||
total 256288 42935 43227 99.3% 13.4% 14.0% 255575 11.63 14.6% 99.6* 21* 0.975 19321 | |||
</pre> | |||
The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with [[ccp4com:SHELX C/D/E|hkl2map]]: | |||
== Result == | |||
=== SHELXC: anomalous CC<sub>1/2</sub> === | |||
[[File:Cc12ano.png]] | |||
=== SHELXD: CCall ''versus'' CCweak, and histogram === | |||
[[File:Ccallcsccweak.png]] | |||
[[File:Histcfom.png]] | |||
=== SHELXE: contrast versus cycle, and PDB with structure === | |||
[[File:Contrastvscycle.png]] | |||
[[File:Ribbon.png]] | |||
Further optimization of processing may be possible, but is left as an exercise to the reader. |