SSX: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
(Created page with "<pre> SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA...")
 
m (link to xscale and xscale_isocluster)
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
This article deals with how to process serial synchrotron crystallography (SSX) data.
The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be [http://bl831.als.lbl.gov/example_data_sets/tarballs downloaded] and the data and problem are described on his [http://bl831.als.lbl.gov/~jamesh/challenge/microfocus microfocus challenge page], and in a [http://journals.iucr.org/d/issues/2019/02/00/ba5297/index.html paper].
The challenges are
# partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage
# the crystals decay to about 1/2 within these 3 frames
# the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set.
The solution is to use [[XSCALE]] for scaling, and [[xscale_isocluster]] for analysing the scaled data.
== Round 1: processing the data, and determining the space group ==
In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far.
Using the following as the processing script integrate.rc:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 1
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
we obtain in P1
<pre>
<pre>
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 26: Line 69:
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
     total      256616  81419    86791      93.8%      54.3%    51.3%  243147    1.43    64.0%    64.6*    0    0.788  17331
</pre>
</pre>
 
and feed this to pointless:
 
  pointless xdsin temp.ahkl
  pointless xdsin temp.ahkl
 
which tells us
<pre>
<pre>
Scores for each symmetry element
Scores for each symmetry element
Line 76: Line 118:


</pre>
</pre>
Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to
<pre>
SPACE_GROUP_NUMBER= 94
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
REIDX=0 1 0 0  0 0 1 0  1 0 0 0
</pre>
where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with  a,b,c of 38.3,79.1,79.1) , and obtain
<pre>
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.03        2978    167      167      100.0%      53.6%    45.8%    2978    5.94    55.1%    99.2*    22    1.190      76
    5.68        5488    274      274      100.0%      54.0%    46.1%    5488    6.12    55.4%    97.0*    20    0.915    175
    4.64        6976    338      338      100.0%      55.4%    46.1%    6976    6.25    57.0%    99.1*    15    0.983    237
    4.01        8069    390      390      100.0%      57.5%    46.3%    8069    6.01    59.0%    93.7*    8    0.991    294
    3.59        9191    440      440      100.0%      63.9%    46.7%    9191    5.80    65.5%    89.2*    3    1.071    338
    3.28      10239    474      474      100.0%      63.8%    47.0%    10239    5.85    65.4%    89.4*    4    1.119    375
    3.03      11037    511      511      100.0%      66.0%    47.5%    11037    5.33    67.6%    91.7*    3    1.068    412
    2.84      12014    547      547      100.0%      69.6%    49.1%    12014    4.80    71.2%    82.2*    -1    1.092    447
    2.68      12698    580      580      100.0%      72.2%    51.0%    12698    4.34    73.9%    83.8*    -7    0.969    478
    2.54      13360    612      612      100.0%      73.5%    54.1%    13360    3.98    75.3%    73.4*    4    1.025    511
    2.42      14299    642      642      100.0%      76.8%    58.2%    14299    3.59    78.6%    57.0*    6    1.016    545
    2.32      14827    667      667      100.0%      77.8%    62.3%    14827    3.38    79.6%    70.3*    1    0.924    563
    2.23      15588    698      698      100.0%      79.5%    64.6%    15588    3.22    81.3%    64.9*    -1    0.914    597
    2.15      15888    705      705      100.0%      79.3%    68.0%    15888    3.23    81.1%    52.5*    -5    0.882    614
    2.07      16867    754      754      100.0%      82.7%    74.7%    16867    2.92    84.6%    50.1*    3    0.920    647
    2.01      16847    754      754      100.0%      86.1%    77.3%    16847    2.73    88.1%    47.6*    -3    0.839    658
    1.95      17842    799      799      100.0%      90.4%    86.7%    17842    2.47    92.4%    49.3*    1    0.822    696
    1.89      18095    810      811      99.9%      96.8%    101.2%    18095    2.21    99.1%    44.6*    -4    0.773    707
    1.84      18633    829      829      100.0%    106.4%    126.3%    18633    1.90    108.9%    39.6*    -6    0.730    736
    1.80      15510    824      863      95.5%    118.1%    151.4%    15500    1.46    121.2%    32.3*    2    0.688    699
    total      256446  11815    11855      99.7%      64.9%    51.6%  256436    3.61    66.5%    97.9*    1    0.910    9805
</pre>
Analysis with
xscale_isocluster -dim 2 -clu 2 temp.ahkl
yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud:
[[File:1g1c-94.png]]
(If the space group were correct, the result of [[xscale_isocluster]] should look similar to this:
[[File:Lyso-xscale-isocluster.png]]
which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.)
We must now investigate whether the data have lower than tetragonal symmetry.
XSCALEing with
SPACE_GROUP_NUMBER=16
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90
gives a new temp.ahkl, with orthorhombic symmetry.
xscale_isocluster -dim 2 -clu 2 temp.ahkl
gives
<pre>
psi=  0.1692468      nhalo=          0
cluster:  1 center:    2 elements:    51 core:    51 halo:    0
cluster:  2 center:    6 elements:    49 core:    49 halo:    0
</pre>
and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files).
coot iso.pdb
shows
[[File:Coot.png]]
and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group.
Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get
<pre>
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.03        1493    297      306      97.1%      11.8%    23.7%    1467    6.04    13.0%    98.2*    52*  0.662    123
    5.68        2829    514      521      98.7%      18.9%    24.2%    2796    5.98    20.9%    96.1*    26*  0.778    258
    4.64        3576    638      646      98.8%      23.3%    24.2%    3554    6.07    25.7%    93.3*    12    0.829    346
    4.01        4140    748      756      98.9%      28.2%    24.5%    4105    5.84    31.0%    89.4*    -5    0.818    418
    3.59        4735    838      852      98.4%      30.9%    25.0%    4709    5.72    33.9%    86.7*    5    0.983    470
    3.28        5268    912      921      99.0%      34.7%    25.8%    5228    5.52    38.0%    85.9*    0    1.005    533
    3.03        5664    982      994      98.8%      37.8%    27.4%    5634    4.90    41.4%    82.1*    4    1.031    563
    2.84        6114    1065      1068      99.7%      40.4%    31.7%    6082    4.13    44.4%    82.5*    5    0.963    613
    2.68        6486    1127      1133      99.5%      44.5%    37.2%    6450    3.54    48.9%    74.8*    1    0.824    644
    2.54        6819    1188      1197      99.2%      48.2%    44.6%    6784    3.01    53.0%    70.4*    1    0.816    709
    2.42        7278    1249      1259      99.2%      51.9%    54.7%    7249    2.56    56.9%    70.6*    4    0.751    756
    2.32        7595    1297      1304      99.5%      55.9%    63.4%    7555    2.26    61.5%    58.5*    4    0.729    809
    2.23        7943    1361      1371      99.3%      57.8%    66.4%    7903    2.16    63.3%    63.5*    -3    0.687    844
    2.15        8093    1375      1385      99.3%      60.1%    75.4%    8054    2.03    65.9%    66.7*    3    0.664    860
    2.07        8561    1476      1482      99.6%      64.8%    88.3%    8512    1.76    71.1%    53.0*    7    0.640    914
    2.01        8613    1473      1482      99.4%      68.3%    95.8%    8570    1.60    74.9%    60.6*    -1    0.628    928
    1.95        9048    1566      1571      99.7%      73.1%    112.2%    9004    1.41    80.2%    56.7*    -3    0.571    966
    1.89        9236    1580      1593      99.2%      82.6%    142.1%    9204    1.19    90.8%    56.3*    -5    0.504    1000
    1.84        9467    1618      1631      99.2%      92.8%    180.0%    9432    0.96    101.9%    43.2*    4    0.467    1007
    1.80        7927    1570      1701      92.3%    104.8%    225.2%    7811    0.70    116.1%    42.6*    -5    0.425    785
    total      130885  22874    23173      98.7%      38.3%    41.0%  130103    2.77    42.1%    92.0*    3    0.703  13546
</pre>
At this point, we run
xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6
and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP.
After <code>xscale_isocluster -dim 2 -clu 1</code> ,
coot iso.pdb
now reveals a single cloud:
[[File:1g1c-19.png]]
We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS.
pointless xdsin XSCALE.1.HKL
gives
<pre>
  Spacegroup        TotProb SysAbsProb    Reindex        Conditions
    P 21 21 21 ( 19)    0.896  0.924                        h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3)
    ..........
    P 2 21 21 ( 18)    0.044  0.045                        0k0: k=2n, 00l: l=2n (zones 2,3)
    ..........
    P 21 21 2 ( 18)    0.015  0.015                        h00: h=2n, 0k0: k=2n (zones 1,2)
    ..........
    P 21 2 21 ( 18)    0.014  0.014                        h00: h=2n, 00l: l=2n (zones 1,3)
---------------------------------------------------------------
Space group confidence (= Sqrt(Score * (Score - NextBestScore))) =    0.87
Laue group confidence  (= Sqrt(Score * (Score - NextBestScore))) =    0.97
Selecting space group P 21 21 21 as there is a single space group with the highest score
<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$
Best Solution:    space group P 21 21 21
  Reindex operator:                  [h,k,l]               
  Laue group probability:            0.970
  Systematic absence probability:    0.924
  Total probability:                  0.896
  Space group confidence:            0.874
  Laue group confidence              0.966
  Unit cell:  38.30  79.10  79.10    90.00  90.00  90.00
  79.10 to  2.47  - Resolution range used for Laue group search
  79.10 to  1.80  - Resolution range in file, used for systematic absence check
</pre>
thus we now know the spacegroup.
== Round 2: using the REFERENCE_DATA_SET obtained from one cluster==
The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE:
<pre>
#!/bin/bash -f
for f in `seq 1 100`;
do
export OUT=wedge0`printf "%03d" $f`
export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
rm -rf $OUT
mkdir $OUT
cd $OUT
generate_XDS.INP $NAMES
echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP
echo MINIMUM_I/SIGMA=50 >>XDS.INP
sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP
sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
/usr/local/bin/xds_par
cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 19
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP
</pre>
and we get as XSCALE.LP :
<pre>
      NOTE:      Friedel pairs are treated as different reflections.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  CC(1/2)  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.04        2960    473      476      99.4%      6.2%      5.5%    2955  29.90      6.7%    99.8*    86*  2.824    166
    5.68        5486    890      894      99.6%      4.9%      5.9%    5478  27.38      5.3%    99.7*    86*  2.384    363
    4.64        6934    1136      1138      99.8%      4.9%      5.8%    6918  27.64      5.4%    99.8*    76*  1.829    480
    4.02        8066    1363      1367      99.7%      5.3%      5.9%    8045  26.67      5.9%    99.6*    57*  1.426    590
    3.59        9121    1535      1539      99.7%      6.1%      6.3%    9092  25.58      6.7%    99.6*    50*  1.298    666
    3.28      10222    1690      1694      99.8%      6.8%      6.8%    10203  24.69      7.5%    99.4*    36*  1.204    751
    3.04      10990    1831      1834      99.8%      8.5%      8.0%    10970  21.40      9.3%    99.3*    22*  1.086    827
    2.84      12065    1993      1999      99.7%      11.2%    11.1%    12038  17.68    12.2%    99.0*    24*  1.085    894
    2.68      12771    2120      2124      99.8%      14.7%    15.1%    12738  14.78    16.1%    98.4*    14*  0.960    952
    2.54      13054    2196      2198      99.9%      18.9%    20.2%    13026  12.53    20.8%    97.7*    13*  0.867    995
    2.42      14290    2372      2375      99.9%      24.9%    27.1%    14261  10.34    27.3%    96.1*    6    0.813    1083
    2.32      14704    2432      2438      99.8%      29.8%    32.5%    14676    9.21    32.6%    95.1*    8    0.843    1115
    2.23      15623    2582      2593      99.6%      33.0%    35.0%    15587    8.83    36.1%    93.0*    6    0.831    1180
    2.15      15732    2610      2613      99.9%      37.1%    39.2%    15697    8.10    40.6%    91.0*    8    0.818    1203
    2.08      16782    2788      2795      99.7%      44.1%    47.0%    16741    7.01    48.3%    88.3*    4    0.797    1276
    2.01      16783    2802      2809      99.8%      46.8%    48.7%    16747    6.54    51.2%    89.5*    3    0.807    1293
    1.95      18262    3043      3051      99.7%      56.5%    58.0%    18221    5.61    61.9%    85.9*    0    0.803    1402
    1.89      17810    2979      2988      99.7%      68.3%    69.8%    17769    4.63    74.8%    80.0*    7    0.864    1374
    1.84      18503    3112      3117      99.8%      87.5%    90.3%    18454    3.55    96.0%    69.6*    3    0.838    1435
    1.80      16130    2988      3185      93.8%    101.2%    110.5%    15959    2.77    111.7%    62.9*    2    0.798    1276
    total      256288  42935    43227      99.3%      13.4%    14.0%  255575  11.63    14.6%    99.6*    21*  0.975  19321
</pre>
The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with [[ccp4com:SHELX C/D/E|hkl2map]]:
== Result ==
=== SHELXC: anomalous CC<sub>1/2</sub> ===
[[File:Cc12ano.png]]
=== SHELXD: CCall ''versus'' CCweak, and histogram ===
[[File:Ccallcsccweak.png]]
[[File:Histcfom.png]]
=== SHELXE: contrast versus cycle, and PDB with structure ===
[[File:Contrastvscycle.png]]
[[File:Ribbon.png]]
Further optimization of processing may be possible, but is left as an exercise to the reader.

Latest revision as of 14:08, 29 June 2024

This article deals with how to process serial synchrotron crystallography (SSX) data.

The particular data we are processing are artificial and were prepared by James Holton. The files Illuin_microfocus_minimal_00[1-3].tar.bz2 can be downloaded and the data and problem are described on his microfocus challenge page, and in a paper.

The challenges are

  1. partial data sets: each of the 100 data sets has only 3 good frames of 1° oscillation; later frames have strong radiation damage
  2. the crystals decay to about 1/2 within these 3 frames
  3. the b and c axes are the same length, but the simulated crystals are orthorhombic. This makes it difficult to index them consistently - it is wrong to just merge them in a orthorhombic space group without resolving the indexing ambiguity, because that yields a pseudo-tetragonal twinned merged data set.

The solution is to use XSCALE for scaling, and xscale_isocluster for analysing the scaled data.

Round 1: processing the data, and determining the space group

In order to be able to merge the data in XSCALE, we must ensure that they are all processed in the same space group, with similar cell parameters. Some exploratory processing (not shown) and averaging of cell parameters reveals that IDXREF finds a primitive lattice with one axis of 38.3 Å, and two with 79.1 Å; angles are 90°. The data go to 1.8 Å; beyond that, the intensities suddenly drop to 0 - presumably because James Holton simulated them only that far. Using the following as the processing script integrate.rc:

#!/bin/bash -f
for f in `seq 1 100`;
do
 export OUT=wedge0`printf "%03d" $f`
 export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
 rm -rf $OUT
 mkdir $OUT
 cd $OUT
 generate_XDS.INP $NAMES
 sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
 sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=1/" XDS.INP
 sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
 sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
 sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
 /usr/local/bin/xds_par
 cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 1
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=1 CORRECTIONS=ALL"}' >> XSCALE.INP

we obtain in P1

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        3014     908       958       94.8%      44.5%     42.0%     2896    2.55     52.1%    65.0*     3    0.983     231
     5.68        5502    1679      1788       93.9%      46.8%     42.5%     5239    2.50     54.8%    50.3*     6    1.001     390
     4.64        6996    2164      2292       94.4%      47.5%     42.3%     6656    2.48     55.9%    68.4*     5    1.080     495
     4.01        8079    2580      2735       94.3%      48.7%     42.5%     7591    2.38     57.3%    50.0*     2    1.106     557
     3.59        9167    2904      3099       93.7%      52.1%     42.7%     8694    2.36     61.7%    43.6*    -6    1.017     599
     3.28       10276    3226      3397       95.0%      53.3%     43.3%     9728    2.35     62.8%    36.0*     1    1.104     708
     3.03       11040    3472      3687       94.2%      54.5%     44.3%    10500    2.17     64.2%    44.4*     2    1.044     728
     2.84       12022    3771      3977       94.8%      55.9%     47.2%    11424    1.97     65.8%    36.2*     3    0.999     835
     2.68       12705    3985      4227       94.3%      58.5%     51.0%    12065    1.78     68.8%    37.8*    -3    0.934     898
     2.54       13370    4252      4489       94.7%      59.5%     56.2%    12670    1.61     70.5%    30.1*     4    0.887     869
     2.42       14299    4505      4744       95.0%      62.4%     63.6%    13594    1.46     73.7%    30.2*    -2    0.824     979
     2.32       14835    4647      4915       94.5%      63.8%     70.0%    14083    1.35     75.1%    29.9*    -2    0.765    1041
     2.23       15599    4917      5181       94.9%      65.7%     72.6%    14809    1.31     77.5%    27.6*    -1    0.756    1075
     2.15       15888    4965      5272       94.2%      65.1%     78.6%    15117    1.28     76.9%    26.8*    -2    0.708    1115
     2.07       16872    5324      5601       95.1%      69.1%     88.1%    16035    1.14     81.6%    22.2*     3    0.687    1119
     2.01       16856    5349      5649       94.7%      73.4%     92.5%    15988    1.06     86.5%    19.7*    -3    0.673    1144
     1.95       17842    5666      5976       94.8%      76.7%    105.9%    16959    0.97     90.8%    20.7*    -8    0.606    1189
     1.89       18102    5767      6069       95.0%      84.4%    127.9%    17152    0.85     99.9%    15.1*    -1    0.590    1183
     1.84       18633    5933      6256       94.8%      92.8%    162.0%    17667    0.72    109.8%    17.6*     0    0.533    1236
     1.80       15519    5405      6479       83.4%     103.0%    194.1%    14280    0.58    122.7%    18.2*     1    0.503     940
    total      256616   81419     86791       93.8%      54.3%     51.3%   243147    1.43     64.0%    64.6*     0    0.788   17331

and feed this to pointless:

pointless xdsin temp.ahkl

which tells us

Scores for each symmetry element

Nelmt  Lklhd  Z-cc    CC        N  Rmeas    Symmetry & operator (in Lattice Cell)

  1   0.854   5.41   0.54     801  0.706     identity
  2   0.842   4.62   0.46     785  0.819 **  2-fold l ( 0 0 1) {-h,-k,l}
  3   0.867   5.13   0.51     746  0.912 **  2-fold k ( 0 1 0) {-h,k,-l}
  4   0.837   5.64   0.56     735  0.807 **  2-fold h ( 1 0 0) {h,-k,-l}
  5   0.869   4.96   0.50     742  0.757 **  2-fold   ( 1-1 0) {-k,-h,-l}
  6   0.846   5.52   0.55     719  0.789 **  2-fold   ( 1 1 0) {k,h,-l}
  7   0.852   5.44   0.54    1325  1.146 **  4-fold l ( 0 0 1) {-k,h,l}{k,-h,l}
...
...
Best Solution:    space group P 42 21 2

   Reindex operator:                   [k,l,h]                 
   Laue group probability:             0.989
   Systematic absence probability:     0.915
   Total probability:                  0.905
   Space group confidence:             0.874
   Laue group confidence               0.986

   Unit cell:   79.10  79.10  38.30     90.00  90.00  90.00

   79.10 to  13.70   - Resolution range used for Laue group search

   79.10 to   1.80   - Resolution range in file, used for systematic absence check

   Number of batches in file:      3

The data do not appear to be twinned, from the L-test

$$ <!--SUMMARY_END-->


HKLIN spacegroup: P 1  primitive triclinic

$TEXT:Warning:$$ $$

The input crystal system is primitive triclinic
 (Cell:   38.30  79.10  79.10     90.00  90.00  90.00)
The crystal system chosen for output is primitive tetragonal
 (Cell:   79.10  79.10  38.30     90.00  90.00  90.00)

Based on the P4(2)2(1)2 suggestion, we may try to modify the header of XSCALE.INP to

SPACE_GROUP_NUMBER= 94
UNIT_CELL_CONSTANTS= 79.1 79.1 38.3 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
FRIEDEL'S_LAW=TRUE
REIDX=0 1 0 0   0 0 1 0  1 0 0 0

where the last line takes care of the shuffling of axes into the order k,l,h, (after all, the XDS_ASCII.HKL are in P1 with a,b,c of 38.3,79.1,79.1) , and obtain

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        2978     167       167      100.0%      53.6%     45.8%     2978    5.94     55.1%    99.2*    22    1.190      76
     5.68        5488     274       274      100.0%      54.0%     46.1%     5488    6.12     55.4%    97.0*    20    0.915     175
     4.64        6976     338       338      100.0%      55.4%     46.1%     6976    6.25     57.0%    99.1*    15    0.983     237
     4.01        8069     390       390      100.0%      57.5%     46.3%     8069    6.01     59.0%    93.7*     8    0.991     294
     3.59        9191     440       440      100.0%      63.9%     46.7%     9191    5.80     65.5%    89.2*     3    1.071     338
     3.28       10239     474       474      100.0%      63.8%     47.0%    10239    5.85     65.4%    89.4*     4    1.119     375
     3.03       11037     511       511      100.0%      66.0%     47.5%    11037    5.33     67.6%    91.7*     3    1.068     412
     2.84       12014     547       547      100.0%      69.6%     49.1%    12014    4.80     71.2%    82.2*    -1    1.092     447
     2.68       12698     580       580      100.0%      72.2%     51.0%    12698    4.34     73.9%    83.8*    -7    0.969     478
     2.54       13360     612       612      100.0%      73.5%     54.1%    13360    3.98     75.3%    73.4*     4    1.025     511
     2.42       14299     642       642      100.0%      76.8%     58.2%    14299    3.59     78.6%    57.0*     6    1.016     545
     2.32       14827     667       667      100.0%      77.8%     62.3%    14827    3.38     79.6%    70.3*     1    0.924     563
     2.23       15588     698       698      100.0%      79.5%     64.6%    15588    3.22     81.3%    64.9*    -1    0.914     597
     2.15       15888     705       705      100.0%      79.3%     68.0%    15888    3.23     81.1%    52.5*    -5    0.882     614
     2.07       16867     754       754      100.0%      82.7%     74.7%    16867    2.92     84.6%    50.1*     3    0.920     647
     2.01       16847     754       754      100.0%      86.1%     77.3%    16847    2.73     88.1%    47.6*    -3    0.839     658
     1.95       17842     799       799      100.0%      90.4%     86.7%    17842    2.47     92.4%    49.3*     1    0.822     696
     1.89       18095     810       811       99.9%      96.8%    101.2%    18095    2.21     99.1%    44.6*    -4    0.773     707
     1.84       18633     829       829      100.0%     106.4%    126.3%    18633    1.90    108.9%    39.6*    -6    0.730     736
     1.80       15510     824       863       95.5%     118.1%    151.4%    15500    1.46    121.2%    32.3*     2    0.688     699
    total      256446   11815     11855       99.7%      64.9%     51.6%   256436    3.61     66.5%    97.9*     1    0.910    9805

Analysis with

xscale_isocluster -dim 2 -clu 2 temp.ahkl

yields a iso.pdb which is not at all a single cluster; it is a severely elongated single cloud:

1g1c-94.png

(If the space group were correct, the result of xscale_isocluster should look similar to this:

Lyso-xscale-isocluster.png

which is from a lysozyme SSX data collection performed at the SLS; outliers are labelled. In this case, the data are truely tetragonal.)

We must now investigate whether the data have lower than tetragonal symmetry. XSCALEing with

SPACE_GROUP_NUMBER=16
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90

gives a new temp.ahkl, with orthorhombic symmetry.

xscale_isocluster -dim 2 -clu 2 temp.ahkl

gives

 psi=  0.1692468      nhalo=           0
cluster:  1 center:     2 elements:    51 core:    51 halo:     0
cluster:  2 center:     6 elements:    49 core:    49 halo:     0

and prepares XSCALE.1.INP (and XSCALE.2.INP) for further use (these two files collect the differently, but internally-consistently indexed XDS_ASCII.HKL files).

coot iso.pdb 

shows

Coot.png

and thus reveals two well separated clouds, corresponding to the two possible indexing modes of the data in an orthorhombic space group.

Using XSCALE.1.INP with its 51 XDS_ASCII.HKL, and FRIEDEL'S_LAW=TRUE, we get

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.03        1493     297       306       97.1%      11.8%     23.7%     1467    6.04     13.0%    98.2*    52*   0.662     123
     5.68        2829     514       521       98.7%      18.9%     24.2%     2796    5.98     20.9%    96.1*    26*   0.778     258
     4.64        3576     638       646       98.8%      23.3%     24.2%     3554    6.07     25.7%    93.3*    12    0.829     346
     4.01        4140     748       756       98.9%      28.2%     24.5%     4105    5.84     31.0%    89.4*    -5    0.818     418
     3.59        4735     838       852       98.4%      30.9%     25.0%     4709    5.72     33.9%    86.7*     5    0.983     470
     3.28        5268     912       921       99.0%      34.7%     25.8%     5228    5.52     38.0%    85.9*     0    1.005     533
     3.03        5664     982       994       98.8%      37.8%     27.4%     5634    4.90     41.4%    82.1*     4    1.031     563
     2.84        6114    1065      1068       99.7%      40.4%     31.7%     6082    4.13     44.4%    82.5*     5    0.963     613
     2.68        6486    1127      1133       99.5%      44.5%     37.2%     6450    3.54     48.9%    74.8*     1    0.824     644
     2.54        6819    1188      1197       99.2%      48.2%     44.6%     6784    3.01     53.0%    70.4*     1    0.816     709
     2.42        7278    1249      1259       99.2%      51.9%     54.7%     7249    2.56     56.9%    70.6*     4    0.751     756
     2.32        7595    1297      1304       99.5%      55.9%     63.4%     7555    2.26     61.5%    58.5*     4    0.729     809
     2.23        7943    1361      1371       99.3%      57.8%     66.4%     7903    2.16     63.3%    63.5*    -3    0.687     844
     2.15        8093    1375      1385       99.3%      60.1%     75.4%     8054    2.03     65.9%    66.7*     3    0.664     860
     2.07        8561    1476      1482       99.6%      64.8%     88.3%     8512    1.76     71.1%    53.0*     7    0.640     914
     2.01        8613    1473      1482       99.4%      68.3%     95.8%     8570    1.60     74.9%    60.6*    -1    0.628     928
     1.95        9048    1566      1571       99.7%      73.1%    112.2%     9004    1.41     80.2%    56.7*    -3    0.571     966
     1.89        9236    1580      1593       99.2%      82.6%    142.1%     9204    1.19     90.8%    56.3*    -5    0.504    1000
     1.84        9467    1618      1631       99.2%      92.8%    180.0%     9432    0.96    101.9%    43.2*     4    0.467    1007
     1.80        7927    1570      1701       92.3%     104.8%    225.2%     7811    0.70    116.1%    42.6*    -5    0.425     785
    total      130885   22874     23173       98.7%      38.3%     41.0%   130103    2.77     42.1%    92.0*     3    0.703   13546

At this point, we run

xdscc12 -w XSCALE.1.HKL | grep ^a | sort -nk6

and find that data sets 1 and 17 are wrongly included in the cloud of 51 data sets. Thus they are removed manually from XSCALE.INP.

After xscale_isocluster -dim 2 -clu 1 ,

coot iso.pdb

now reveals a single cloud:

1g1c-19.png

We then re-run XSCALE with MERGE=TRUE. The resulting reflection output file XSCALE.1.HKL is then used as REFERENCE_DATA_SET for a second round of integration with XDS.

pointless xdsin XSCALE.1.HKL

gives

   Spacegroup         TotProb SysAbsProb     Reindex         Conditions

    P 21 21 21 ( 19)    0.896  0.924                         h00: h=2n, 0k0: k=2n, 00l: l=2n (zones 1,2,3)
    ..........
     P 2 21 21 ( 18)    0.044  0.045                         0k0: k=2n, 00l: l=2n (zones 2,3)
    ..........
     P 21 21 2 ( 18)    0.015  0.015                         h00: h=2n, 0k0: k=2n (zones 1,2)
    ..........
     P 21 2 21 ( 18)    0.014  0.014                         h00: h=2n, 00l: l=2n (zones 1,3)


---------------------------------------------------------------


Space group confidence (= Sqrt(Score * (Score - NextBestScore))) =     0.87

Laue group confidence  (= Sqrt(Score * (Score - NextBestScore))) =     0.97

Selecting space group P 21 21 21 as there is a single space group with the highest score

<!--SUMMARY_BEGIN--> $TEXT:Result: $$ $$
Best Solution:    space group P 21 21 21

   Reindex operator:                   [h,k,l]                 
   Laue group probability:             0.970
   Systematic absence probability:     0.924
   Total probability:                  0.896
   Space group confidence:             0.874
   Laue group confidence               0.966

   Unit cell:   38.30  79.10  79.10     90.00  90.00  90.00

   79.10 to   2.47   - Resolution range used for Laue group search

   79.10 to   1.80   - Resolution range in file, used for systematic absence check

thus we now know the spacegroup.

Round 2: using the REFERENCE_DATA_SET obtained from one cluster

The processing script integrate.rc is changed a bit, to a) use the REFERENCE_DATA_SET, b) prevent adjustment of variances by CORRECT (this should rather be done by XSCALE) , c) allow some radiation damage correction in XSCALE:

#!/bin/bash -f
for f in `seq 1 100`;
do
 export OUT=wedge0`printf "%03d" $f`
 export NAMES="$PWD/Illuin/microfocus/xtal"`printf "%03d" $f`"_1_00\?.img"
 rm -rf $OUT
 mkdir $OUT
 cd $OUT
 generate_XDS.INP $NAMES
 echo REFERENCE_DATA_SET=../reference.hkl >> XDS.INP
 echo MINIMUM_I/SIGMA=50 >>XDS.INP
 sed -i s"/SPOT_RANGE=1 1/SPOT_RANGE=1 3/" XDS.INP
 sed -i s"/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER=19/" XDS.INP
 sed -i s"/UNIT_CELL_CONSTANTS= 70 80 90/UNIT_CELL_CONSTANTS=38.3 79.1 79.1/" XDS.INP
 sed -i s"/TRUSTED_REGION=0.0 1.2/TRUSTED_REGION=0 1/" XDS.INP
 sed -i s"/INCLUDE_RESOLUTION_RANGE=50 0/INCLUDE_RESOLUTION_RANGE=99 1.8/" XDS.INP
 /usr/local/bin/xds_par
 cd ..
done
mkdir xscale
cd xscale
cat >XSCALE.INP <<eof
SPACE_GROUP_NUMBER= 19
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90
OUTPUT_FILE=temp.ahkl
SAVE_CORRECTION_IMAGES=FALSE
eof
find $PWD/../wedge* -name XDS_ASCII.HKL | awk '{print "INPUT_FILE=",$0;print "NBATCH=3 CORRECTIONS=ALL"}' >> XSCALE.INP

and we get as XSCALE.LP :

       NOTE:      Friedel pairs are treated as different reflections.

 SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
 RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  CC(1/2)  Anomal  SigAno   Nano
   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr

     8.04        2960     473       476       99.4%       6.2%      5.5%     2955   29.90      6.7%    99.8*    86*   2.824     166
     5.68        5486     890       894       99.6%       4.9%      5.9%     5478   27.38      5.3%    99.7*    86*   2.384     363
     4.64        6934    1136      1138       99.8%       4.9%      5.8%     6918   27.64      5.4%    99.8*    76*   1.829     480
     4.02        8066    1363      1367       99.7%       5.3%      5.9%     8045   26.67      5.9%    99.6*    57*   1.426     590
     3.59        9121    1535      1539       99.7%       6.1%      6.3%     9092   25.58      6.7%    99.6*    50*   1.298     666
     3.28       10222    1690      1694       99.8%       6.8%      6.8%    10203   24.69      7.5%    99.4*    36*   1.204     751
     3.04       10990    1831      1834       99.8%       8.5%      8.0%    10970   21.40      9.3%    99.3*    22*   1.086     827
     2.84       12065    1993      1999       99.7%      11.2%     11.1%    12038   17.68     12.2%    99.0*    24*   1.085     894
     2.68       12771    2120      2124       99.8%      14.7%     15.1%    12738   14.78     16.1%    98.4*    14*   0.960     952
     2.54       13054    2196      2198       99.9%      18.9%     20.2%    13026   12.53     20.8%    97.7*    13*   0.867     995
     2.42       14290    2372      2375       99.9%      24.9%     27.1%    14261   10.34     27.3%    96.1*     6    0.813    1083
     2.32       14704    2432      2438       99.8%      29.8%     32.5%    14676    9.21     32.6%    95.1*     8    0.843    1115
     2.23       15623    2582      2593       99.6%      33.0%     35.0%    15587    8.83     36.1%    93.0*     6    0.831    1180
     2.15       15732    2610      2613       99.9%      37.1%     39.2%    15697    8.10     40.6%    91.0*     8    0.818    1203
     2.08       16782    2788      2795       99.7%      44.1%     47.0%    16741    7.01     48.3%    88.3*     4    0.797    1276
     2.01       16783    2802      2809       99.8%      46.8%     48.7%    16747    6.54     51.2%    89.5*     3    0.807    1293
     1.95       18262    3043      3051       99.7%      56.5%     58.0%    18221    5.61     61.9%    85.9*     0    0.803    1402
     1.89       17810    2979      2988       99.7%      68.3%     69.8%    17769    4.63     74.8%    80.0*     7    0.864    1374
     1.84       18503    3112      3117       99.8%      87.5%     90.3%    18454    3.55     96.0%    69.6*     3    0.838    1435
     1.80       16130    2988      3185       93.8%     101.2%    110.5%    15959    2.77    111.7%    62.9*     2    0.798    1276
    total      256288   42935     43227       99.3%      13.4%     14.0%   255575   11.63     14.6%    99.6*    21*   0.975   19321

The substructure (locating 4 Se with anom data to 3Å) and structure (198 residues) can now easily be solved with hkl2map:

Result

SHELXC: anomalous CC1/2

Cc12ano.png

SHELXD: CCall versus CCweak, and histogram

Ccallcsccweak.png

Histcfom.png

SHELXE: contrast versus cycle, and PDB with structure

Contrastvscycle.png

Ribbon.png

Further optimization of processing may be possible, but is left as an exercise to the reader.