SSX: Difference between revisions

Jump to navigation Jump to search
10,366 bytes added ,  5 August 2019
m
no edit summary
No edit summary
mNo edit summary
 
(16 intermediate revisions by the same user not shown)
Line 1: Line 1:
This article deals with how to process serial synchrotron crystallography (SSX) data.
The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be [http://bl831.als.lbl.gov/example_data_sets/tarballs downloaded] and the data and problem are described on his [http://bl831.als.lbl.gov/~jamesh/challenge/microfocus microfocus challenge page], and in a [http://journals.iucr.org/d/issues/2019/02/00/ba5297/index.html paper].
The challenges are
# partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage
# the crystals decay to about 1/2 within these 3 frames
# the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set.
== Round 1: processing the data, and determining the space group ==
In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far.
Using the following as the processing script integrate.rc:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 1
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
we obtain in P1
<pre>
<pre>
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 26: Line 68:
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
</pre>
</pre>
 
and feed this to pointless:
 
  pointless xdsin temp.ahkl
  pointless xdsin temp.ahkl
 
which tells us
<pre>
<pre>
Scores for each symmetry element
Scores for each symmetry element
Line 76: Line 117:


</pre>
</pre>
 
Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to
<pre>
SPACE_GROUP_NUMBER= 94
SPACE_GROUP_NUMBER= 94
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
Line 83: Line 125:
FRIEDEL'S_LAW=TRUE
FRIEDEL'S_LAW=TRUE
REIDX=0 1 0 0  0 0 1 0  1 0 0 0
REIDX=0 1 0 0  0 0 1 0  1 0 0 0
 
</pre>
where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with  a,b,c of 38.3,79.1,79.1) , and obtain
<pre>
<pre>
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 111: Line 154:
     total      256446  11815    11855      99.7%      64.9%    51.6%  256436    3.61    66.5%    97.9*    1    0.910    9805
     total      256446  11815    11855      99.7%      64.9%    51.6%  256436    3.61    66.5%    97.9*    1    0.910    9805
</pre>
</pre>
Analysis with
  xscale_isocluster -dim 2 -clu 2 temp.ahkl
  xscale_isocluster -dim 2 -clu 2 temp.ahkl
yields a iso.pdb which is far from a single cluster; it is a severely elongated single cloud. We must now investigate whether the data have lower than tetragonal symmetry.
yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud:
 
[[File:1g1c-94.png]]
 
(If the space group were correct, the result of [[xscale_isocluster]] should look similar to this:
 
[[File:Lyso-xscale-isocluster.png]]
 
which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.)
 
We must now investigate whether the data have lower than tetragonal symmetry.
XSCALEing with
XSCALEing with
  SPACE_GROUP_NUMBER=19
  SPACE_GROUP_NUMBER=16
  UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90
  UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90
gives a new temp.ahkl, with orthorhombic symmetry.
gives a new temp.ahkl, with orthorhombic symmetry.
  xscale_isocluster -dim 2 -clu 2 temp.ahkl
  xscale_isocluster -dim 2 -clu 2 temp.ahkl
gives
gives
<pre>
  psi=  0.1692468      nhalo=          0
  psi=  0.1692468      nhalo=          0
cluster:  1 center:    2 elements:    51 core:    51 halo:    0
cluster:  1 center:    2 elements:    51 core:    51 halo:    0
cluster:  2 center:    6 elements:    49 core:    49 halo:    0
cluster:  2 center:    6 elements:    49 core:    49 halo:    0
and prepares XSCALE.1.INP (and XSCALE.2.INP for further use.
</pre>
and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files).
  coot iso.pdb  
  coot iso.pdb  
shows
shows


thus two well separated clouds.
[[File:Coot.png]]
 
and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group.


Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and changing !INCLUDE RESOLUTION_RANGE= 0 0 to FRIEDEL'S_LAW=TRUE, we get
Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get
<pre>
<pre>
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 155: Line 213:
     1.80        7927    1570      1701      92.3%    104.8%    225.2%    7811    0.70    116.1%    42.6*    -5    0.425    785
     1.80        7927    1570      1701      92.3%    104.8%    225.2%    7811    0.70    116.1%    42.6*    -5    0.425    785
     total      130885  22874    23173      98.7%      38.3%    41.0%  130103    2.77    42.1%    92.0*    3    0.703  13546
     total      130885  22874    23173      98.7%      38.3%    41.0%  130103    2.77    42.1%    92.0*    3    0.703  13546
</pre>
At this point, we run
xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6
and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP.
After <code>xscale_isocluster -dim 2 -clu 1</code> ,
coot iso.pdb
now reveals a single cloud:
[[File:1g1c-19.png]]
We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS.
pointless xdsin XSCALE.1.HKL
gives
<pre>
  Spacegroup        TotProb SysAbsProb    Reindex        Conditions
    P 21 21 21 ( 19)    0.896  0.924                        h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3)
    ..........
    P 2 21 21 ( 18)    0.044  0.045                        0k0: k=2n, 00l: l=2n (zones 2,3)
    ..........
    P 21 21 2 ( 18)    0.015  0.015                        h00: h=2n, 0k0: k=2n (zones 1,2)
    ..........
    P 21 2 21 ( 18)    0.014  0.014                        h00: h=2n, 00l: l=2n (zones 1,3)
---------------------------------------------------------------
Space group confidence (= Sqrt(Score * (Score - NextBestScore))) =    0.87
Laue group confidence  (= Sqrt(Score * (Score - NextBestScore))) =    0.97
Selecting space group P 21 21 21 as there is a single space group with the highest score
<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$
Best Solution:    space group P 21 21 21
  Reindex operator:                  [h,k,l]               
  Laue group probability:            0.970
  Systematic absence probability:    0.924
  Total probability:                  0.896
  Space group confidence:            0.874
  Laue group confidence              0.966
  Unit cell:  38.30  79.10  79.10    90.00  90.00  90.00
  79.10 to  2.47  - Resolution range used for Laue group search
  79.10 to  1.80  - Resolution range in file, used for systematic absence check
</pre>
thus we now know the spacegroup.
== Round 2: using the REFERENCE_DATA_SET obtained from one cluster==
The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP
echo MINIMUM_I/SIGMA=50 >>XDS.INP
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 19
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
and we get as XSCALE.LP :
<pre>
      NOTE:      Friedel pairs are treated as different reflections.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.04        2960    473      476      99.4%      6.2%      5.5%    2955  29.90      6.7%    99.8*    86*  2.824    166
    5.68        5486    890      894      99.6%      4.9%      5.9%    5478  27.38      5.3%    99.7*    86*  2.384    363
    4.64        6934    1136      1138      99.8%      4.9%      5.8%    6918  27.64      5.4%    99.8*    76*  1.829    480
    4.02        8066    1363      1367      99.7%      5.3%      5.9%    8045  26.67      5.9%    99.6*    57*  1.426    590
    3.59        9121    1535      1539      99.7%      6.1%      6.3%    9092  25.58      6.7%    99.6*    50*  1.298    666
    3.28      10222    1690      1694      99.8%      6.8%      6.8%    10203  24.69      7.5%    99.4*    36*  1.204    751
    3.04      10990    1831      1834      99.8%      8.5%      8.0%    10970  21.40      9.3%    99.3*    22*  1.086    827
    2.84      12065    1993      1999      99.7%      11.2%    11.1%    12038  17.68    12.2%    99.0*    24*  1.085    894
    2.68      12771    2120      2124      99.8%      14.7%    15.1%    12738  14.78    16.1%    98.4*    14*  0.960    952
    2.54      13054    2196      2198      99.9%      18.9%    20.2%    13026  12.53    20.8%    97.7*    13*  0.867    995
    2.42      14290    2372      2375      99.9%      24.9%    27.1%    14261  10.34    27.3%    96.1*    6    0.813    1083
    2.32      14704    2432      2438      99.8%      29.8%    32.5%    14676    9.21    32.6%    95.1*    8    0.843    1115
    2.23      15623    2582      2593      99.6%      33.0%    35.0%    15587    8.83    36.1%    93.0*    6    0.831    1180
    2.15      15732    2610      2613      99.9%      37.1%    39.2%    15697    8.10    40.6%    91.0*    8    0.818    1203
    2.08      16782    2788      2795      99.7%      44.1%    47.0%    16741    7.01    48.3%    88.3*    4    0.797    1276
    2.01      16783    2802      2809      99.8%      46.8%    48.7%    16747    6.54    51.2%    89.5*    3    0.807    1293
    1.95      18262    3043      3051      99.7%      56.5%    58.0%    18221    5.61    61.9%    85.9*    0    0.803    1402
    1.89      17810    2979      2988      99.7%      68.3%    69.8%    17769    4.63    74.8%    80.0*    7    0.864    1374
    1.84      18503    3112      3117      99.8%      87.5%    90.3%    18454    3.55    96.0%    69.6*    3    0.838    1435
    1.80      16130    2988      3185      93.8%    101.2%    110.5%    15959    2.77    111.7%    62.9*    2    0.798    1276
    total      256288  42935    43227      99.3%      13.4%    14.0%  255575  11.63    14.6%    99.6*    21*  0.975  19321
</pre>
The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with [[ccp4com:SHELX C/D/E|hkl2map]]:
== Result ==
=== SHELXC: anomalous CC<sub>1/2</sub> ===
[[File:Cc12ano.png]]
=== SHELXD: CCall ''versus'' CCweak, and histogram ===
[[File:Ccallcsccweak.png]]
[[File:Histcfom.png]]
=== SHELXE: contrast versus cycle, and PDB with structure ===
[[File:Contrastvscycle.png]]
[[File:Ribbon.png]]
Further optimization of processing may be possible, but is left as an exercise to the reader.
2,652

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu