SSX: Difference between revisions

Jump to navigation Jump to search
17,559 bytes added ,  5 August 2019
m
no edit summary
(Created page with "<pre> SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA...")
 
mNo edit summary
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
This article deals with how to process serial synchrotron crystallography (SSX) data.
The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be [http://bl831.als.lbl.gov/example_data_sets/tarballs downloaded] and the data and problem are described on his [http://bl831.als.lbl.gov/~jamesh/challenge/microfocus microfocus challenge page], and in a [http://journals.iucr.org/d/issues/2019/02/00/ba5297/index.html paper].
The challenges are
# partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage
# the crystals decay to about 1/2 within these 3 frames
# the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set.
== Round 1: processing the data, and determining the space group ==
In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far.
Using the following as the processing script integrate.rc:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 1
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
we obtain in P1
<pre>
<pre>
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 26: Line 68:
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
</pre>
</pre>
 
and feed this to pointless:
 
  pointless xdsin temp.ahkl
  pointless xdsin temp.ahkl
 
which tells us
<pre>
<pre>
Scores for each symmetry element
Scores for each symmetry element
Line 76: Line 117:


</pre>
</pre>
Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to
<pre>
SPACE_GROUP_NUMBER= 94
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
REIDX=0 1 0 0  0 0 1 0  1 0 0 0
</pre>
where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with  a,b,c of 38.3,79.1,79.1) , and obtain
<pre>
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.03        2978    167      167      100.0%      53.6%    45.8%    2978    5.94    55.1%    99.2*    22    1.190      76
    5.68        5488    274      274      100.0%      54.0%    46.1%    5488    6.12    55.4%    97.0*    20    0.915    175
    4.64        6976    338      338      100.0%      55.4%    46.1%    6976    6.25    57.0%    99.1*    15    0.983    237
    4.01        8069    390      390      100.0%      57.5%    46.3%    8069    6.01    59.0%    93.7*    8    0.991    294
    3.59        9191    440      440      100.0%      63.9%    46.7%    9191    5.80    65.5%    89.2*    3    1.071    338
    3.28      10239    474      474      100.0%      63.8%    47.0%    10239    5.85    65.4%    89.4*    4    1.119    375
    3.03      11037    511      511      100.0%      66.0%    47.5%    11037    5.33    67.6%    91.7*    3    1.068    412
    2.84      12014    547      547      100.0%      69.6%    49.1%    12014    4.80    71.2%    82.2*    -1    1.092    447
    2.68      12698    580      580      100.0%      72.2%    51.0%    12698    4.34    73.9%    83.8*    -7    0.969    478
    2.54      13360    612      612      100.0%      73.5%    54.1%    13360    3.98    75.3%    73.4*    4    1.025    511
    2.42      14299    642      642      100.0%      76.8%    58.2%    14299    3.59    78.6%    57.0*    6    1.016    545
    2.32      14827    667      667      100.0%      77.8%    62.3%    14827    3.38    79.6%    70.3*    1    0.924    563
    2.23      15588    698      698      100.0%      79.5%    64.6%    15588    3.22    81.3%    64.9*    -1    0.914    597
    2.15      15888    705      705      100.0%      79.3%    68.0%    15888    3.23    81.1%    52.5*    -5    0.882    614
    2.07      16867    754      754      100.0%      82.7%    74.7%    16867    2.92    84.6%    50.1*    3    0.920    647
    2.01      16847    754      754      100.0%      86.1%    77.3%    16847    2.73    88.1%    47.6*    -3    0.839    658
    1.95      17842    799      799      100.0%      90.4%    86.7%    17842    2.47    92.4%    49.3*    1    0.822    696
    1.89      18095    810      811      99.9%      96.8%    101.2%    18095    2.21    99.1%    44.6*    -4    0.773    707
    1.84      18633    829      829      100.0%    106.4%    126.3%    18633    1.90    108.9%    39.6*    -6    0.730    736
    1.80      15510    824      863      95.5%    118.1%    151.4%    15500    1.46    121.2%    32.3*    2    0.688    699
    total      256446  11815    11855      99.7%      64.9%    51.6%  256436    3.61    66.5%    97.9*    1    0.910    9805
</pre>
Analysis with
xscale_isocluster -dim 2 -clu 2 temp.ahkl
yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud:
[[File:1g1c-94.png]]
(If the space group were correct, the result of [[xscale_isocluster]] should look similar to this:
[[File:Lyso-xscale-isocluster.png]]
which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.)
We must now investigate whether the data have lower than tetragonal symmetry.
XSCALEing with
SPACE_GROUP_NUMBER=16
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90
gives a new temp.ahkl, with orthorhombic symmetry.
xscale_isocluster -dim 2 -clu 2 temp.ahkl
gives
<pre>
psi=  0.1692468      nhalo=          0
cluster:  1 center:    2 elements:    51 core:    51 halo:    0
cluster:  2 center:    6 elements:    49 core:    49 halo:    0
</pre>
and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files).
coot iso.pdb
shows
[[File:Coot.png]]
and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group.
Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get
<pre>
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.03        1493    297      306      97.1%      11.8%    23.7%    1467    6.04    13.0%    98.2*    52*  0.662    123
    5.68        2829    514      521      98.7%      18.9%    24.2%    2796    5.98    20.9%    96.1*    26*  0.778    258
    4.64        3576    638      646      98.8%      23.3%    24.2%    3554    6.07    25.7%    93.3*    12    0.829    346
    4.01        4140    748      756      98.9%      28.2%    24.5%    4105    5.84    31.0%    89.4*    -5    0.818    418
    3.59        4735    838      852      98.4%      30.9%    25.0%    4709    5.72    33.9%    86.7*    5    0.983    470
    3.28        5268    912      921      99.0%      34.7%    25.8%    5228    5.52    38.0%    85.9*    0    1.005    533
    3.03        5664    982      994      98.8%      37.8%    27.4%    5634    4.90    41.4%    82.1*    4    1.031    563
    2.84        6114    1065      1068      99.7%      40.4%    31.7%    6082    4.13    44.4%    82.5*    5    0.963    613
    2.68        6486    1127      1133      99.5%      44.5%    37.2%    6450    3.54    48.9%    74.8*    1    0.824    644
    2.54        6819    1188      1197      99.2%      48.2%    44.6%    6784    3.01    53.0%    70.4*    1    0.816    709
    2.42        7278    1249      1259      99.2%      51.9%    54.7%    7249    2.56    56.9%    70.6*    4    0.751    756
    2.32        7595    1297      1304      99.5%      55.9%    63.4%    7555    2.26    61.5%    58.5*    4    0.729    809
    2.23        7943    1361      1371      99.3%      57.8%    66.4%    7903    2.16    63.3%    63.5*    -3    0.687    844
    2.15        8093    1375      1385      99.3%      60.1%    75.4%    8054    2.03    65.9%    66.7*    3    0.664    860
    2.07        8561    1476      1482      99.6%      64.8%    88.3%    8512    1.76    71.1%    53.0*    7    0.640    914
    2.01        8613    1473      1482      99.4%      68.3%    95.8%    8570    1.60    74.9%    60.6*    -1    0.628    928
    1.95        9048    1566      1571      99.7%      73.1%    112.2%    9004    1.41    80.2%    56.7*    -3    0.571    966
    1.89        9236    1580      1593      99.2%      82.6%    142.1%    9204    1.19    90.8%    56.3*    -5    0.504    1000
    1.84        9467    1618      1631      99.2%      92.8%    180.0%    9432    0.96    101.9%    43.2*    4    0.467    1007
    1.80        7927    1570      1701      92.3%    104.8%    225.2%    7811    0.70    116.1%    42.6*    -5    0.425    785
    total      130885  22874    23173      98.7%      38.3%    41.0%  130103    2.77    42.1%    92.0*    3    0.703  13546
</pre>
At this point, we run
xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6
and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP.
After <code>xscale_isocluster -dim 2 -clu 1</code> ,
coot iso.pdb
now reveals a single cloud:
[[File:1g1c-19.png]]
We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS.
pointless xdsin XSCALE.1.HKL
gives
<pre>
  Spacegroup        TotProb SysAbsProb    Reindex        Conditions
    P 21 21 21 ( 19)    0.896  0.924                        h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3)
    ..........
    P 2 21 21 ( 18)    0.044  0.045                        0k0: k=2n, 00l: l=2n (zones 2,3)
    ..........
    P 21 21 2 ( 18)    0.015  0.015                        h00: h=2n, 0k0: k=2n (zones 1,2)
    ..........
    P 21 2 21 ( 18)    0.014  0.014                        h00: h=2n, 00l: l=2n (zones 1,3)
---------------------------------------------------------------
Space group confidence (= Sqrt(Score * (Score - NextBestScore))) =    0.87
Laue group confidence  (= Sqrt(Score * (Score - NextBestScore))) =    0.97
Selecting space group P 21 21 21 as there is a single space group with the highest score
<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$
Best Solution:    space group P 21 21 21
  Reindex operator:                  [h,k,l]               
  Laue group probability:            0.970
  Systematic absence probability:    0.924
  Total probability:                  0.896
  Space group confidence:            0.874
  Laue group confidence              0.966
  Unit cell:  38.30  79.10  79.10    90.00  90.00  90.00
  79.10 to  2.47  - Resolution range used for Laue group search
  79.10 to  1.80  - Resolution range in file, used for systematic absence check
</pre>
thus we now know the spacegroup.
== Round 2: using the REFERENCE_DATA_SET obtained from one cluster==
The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP
echo MINIMUM_I/SIGMA=50 >>XDS.INP
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 19
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
and we get as XSCALE.LP :
<pre>
      NOTE:      Friedel pairs are treated as different reflections.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.04        2960    473      476      99.4%      6.2%      5.5%    2955  29.90      6.7%    99.8*    86*  2.824    166
    5.68        5486    890      894      99.6%      4.9%      5.9%    5478  27.38      5.3%    99.7*    86*  2.384    363
    4.64        6934    1136      1138      99.8%      4.9%      5.8%    6918  27.64      5.4%    99.8*    76*  1.829    480
    4.02        8066    1363      1367      99.7%      5.3%      5.9%    8045  26.67      5.9%    99.6*    57*  1.426    590
    3.59        9121    1535      1539      99.7%      6.1%      6.3%    9092  25.58      6.7%    99.6*    50*  1.298    666
    3.28      10222    1690      1694      99.8%      6.8%      6.8%    10203  24.69      7.5%    99.4*    36*  1.204    751
    3.04      10990    1831      1834      99.8%      8.5%      8.0%    10970  21.40      9.3%    99.3*    22*  1.086    827
    2.84      12065    1993      1999      99.7%      11.2%    11.1%    12038  17.68    12.2%    99.0*    24*  1.085    894
    2.68      12771    2120      2124      99.8%      14.7%    15.1%    12738  14.78    16.1%    98.4*    14*  0.960    952
    2.54      13054    2196      2198      99.9%      18.9%    20.2%    13026  12.53    20.8%    97.7*    13*  0.867    995
    2.42      14290    2372      2375      99.9%      24.9%    27.1%    14261  10.34    27.3%    96.1*    6    0.813    1083
    2.32      14704    2432      2438      99.8%      29.8%    32.5%    14676    9.21    32.6%    95.1*    8    0.843    1115
    2.23      15623    2582      2593      99.6%      33.0%    35.0%    15587    8.83    36.1%    93.0*    6    0.831    1180
    2.15      15732    2610      2613      99.9%      37.1%    39.2%    15697    8.10    40.6%    91.0*    8    0.818    1203
    2.08      16782    2788      2795      99.7%      44.1%    47.0%    16741    7.01    48.3%    88.3*    4    0.797    1276
    2.01      16783    2802      2809      99.8%      46.8%    48.7%    16747    6.54    51.2%    89.5*    3    0.807    1293
    1.95      18262    3043      3051      99.7%      56.5%    58.0%    18221    5.61    61.9%    85.9*    0    0.803    1402
    1.89      17810    2979      2988      99.7%      68.3%    69.8%    17769    4.63    74.8%    80.0*    7    0.864    1374
    1.84      18503    3112      3117      99.8%      87.5%    90.3%    18454    3.55    96.0%    69.6*    3    0.838    1435
    1.80      16130    2988      3185      93.8%    101.2%    110.5%    15959    2.77    111.7%    62.9*    2    0.798    1276
    total      256288  42935    43227      99.3%      13.4%    14.0%  255575  11.63    14.6%    99.6*    21*  0.975  19321
</pre>
The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with [[ccp4com:SHELX C/D/E|hkl2map]]:
== Result ==
=== SHELXC: anomalous CC<sub>1/2</sub> ===
[[File:Cc12ano.png]]
=== SHELXD: CCall ''versus'' CCweak, and histogram ===
[[File:Ccallcsccweak.png]]
[[File:Histcfom.png]]
=== SHELXE: contrast versus cycle, and PDB with structure ===
[[File:Contrastvscycle.png]]
[[File:Ribbon.png]]
Further optimization of processing may be possible, but is left as an exercise to the reader.
2,652

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu