Thaumatin ACA2014

1800 frames of 0.2° were collected on a Pilatus 6M detector at 24-IDC, Advanced Photon Source. Raw data are at https://wiki.uni-konstanz.de/pub/datasets/ACA2014_thaumatin/ . What is special about these data:

  • tiny spots (some are only one pixel)
  • fast: 0.2s total time for each frame. Since the readout time is only 3ms, a shutter is not used ("shutterless data collection".
  • the header is incorrect: the direct beam position is at 1288.9 1262.6 rather than at 1265.47 1273.87 (as the header claims). This may lead to mis-indexing. Below it is demonstrated how to identify this problem.

We start processing by creating an empty directory, and changing to this directory. This can be conveniently done using XDSGUI / Projects / navigate to the directory with the Thaumatin frames / create empty folder.


Running XDSGUI

We click on the "Frame tab" and load one of the frames. Clicking "Generate XDS.INP" creates XDS.INP. This un-greys the XDS.INP tab, and the blind areas of the detector are shown to be masked (red rectangles). Furthermore, a green "x" appears where the header says that the beam position should be.

Next, we click the "XDS.INP" tab. We may manually change parameters, like the resolution cutoff used for processing (INCLUDE_RESOLUTION_RANGE= ). In this case we leave it at its default (50), and don't specify a high-resolution cutoff (0).

After reviewing and possibly changing parameters, we then "save", "run XDS". After a few seconds, the XYCORR, INIT, COLSPOT, IDXREF tabs become black, indicating that XYCORR.LP, INIT.LP, COLSPOT.LP and IDXREF.LP have been created.

We may take a look at the XYCORR, INIT, and COLSPOT tabs to see what XDS has to say about these steps.

IDXREF (indexing)

Next, we click the IDXREF tab and find, after scrolling down:

   #  COORDINATES OF REC. BASIS VECTOR    LENGTH   1/LENGTH

    1   0.0048784-0.0028812 0.0034946  0.0066567     150.22
    2  -0.0111804-0.0117066 0.0059615  0.0172507      57.97
    3   0.0035904-0.0102197-0.0133848  0.0172188      58.08
 
 CLUSTER COORDINATES AND INDICES WITH RESPECT TO REC. LATTICE BASIS VECTORS 
 
   #  COORDINATES OF VECTOR CLUSTER   FREQUENCY       CLUSTER INDICES   
    1  0.0048672-0.0029036 0.0034869     2326.      1.00      0.00      0.00
    2 -0.0209391-0.0058398-0.0010916     2077.     -2.01      0.99     -0.00
    3  0.0097412-0.0057891 0.0069462     2065.      2.00      0.00      0.00
    4  0.0160686 0.0087446-0.0024072     2051.      1.01     -1.00      0.00
    5 -0.0258097-0.0029199-0.0045794     1995.     -3.01      0.99     -0.00
    6 -0.0111928-0.0116435 0.0059000     1900.     -0.01      1.00      0.00
    7  0.0306841 0.0000146 0.0080409     1816.      4.01     -0.99      0.00
    8 -0.0146229 0.0086452-0.0104174     1764.     -2.99      0.00     -0.00
    9 -0.0294185 0.0072888 0.0088294     1720.     -3.01      0.99     -1.00
   10  0.0245344-0.0044011-0.0123085     1705.      2.01     -0.99      1.00
   11  0.0063073 0.0145033-0.0093667     1673.     -0.99     -1.00     -0.00
   12  0.0133648-0.0160316-0.0064337     1664.      2.00      0.00      1.00
   13  0.0418771 0.0116728 0.0021928     1661.      4.03     -1.99      0.00
   14 -0.0342963 0.0101537 0.0053161     1652.     -4.01      0.99     -1.00
   15  0.0035949-0.0101595-0.0132820     1627.      0.00     -0.00      0.99
   16  0.0355623-0.0028693 0.0115246     1600.      5.01     -0.99      0.01
   17 -0.0182374 0.0188742 0.0029450     1591.     -3.00      0.00     -1.00
 ...

This shows the cell parameters as expected, and the difference vectors are indeed close to integers, as they should be. Continuing ...

 ***** RESULTS FROM LOCAL INDEXING OF   3000 OBSERVED SPOTS *****

 MAXIMUM MAGNITUDE OF INDEX DIFFERENCES ALLOWED    8
 MAXIMUM ALLOWED DEVIATION FROM INTEGERAL INDICES     0.050
 MIMINUM QUALITY OF INDICES FOR EACH SPOT IN A SUBTREE    0.80
 QUALITY OF INDICES REQUIRED TO INCLUDE SECOND SUBTREE    0.00
 NUMBER OF SUBTREES     16

  SUBTREE    POPULATION

     1         3000

This shows that the 3000 strongest reflections can be indexed with a single lattice - good !

Next ...

 NUMBER OF ACCEPTED SPOTS FROM LARGEST SUBTREE  2963

 ***** SELECTION OF THE INDEX ORIGIN OF THE REFLECTIONS *****
 The origin of the reflection indices determined so far is   
 0,0,0 by default which is usually correct. In certain critical
 cases it may happen that this automatic choice is wrong which 
 leads to misindexing of the reflections by a constant offset. 
 You may replace the default by specifying INDEX_ORIGIN= h k l 
 in the input file "XDS.INP" and rerun the IDXREF step.        
 Below you find a list of possible alternatives together with a
 measure of their likelihood.
 QUALITY  small values mean a high likelihood for this offset 
 DELTA    is the angle between given and refined beam direction
 XD,YD    computed direct beam position (pixels) on detector 
          given beam position (pixel):  1265.47  1273.87
 X,Y,Z    computed coordinates of the direct beam wave vector
 DH,DK,DL mean absolute difference between observed and 
          fitted indices

  INDEX_   QUALITY  DELTA    XD       YD       X       Y       Z       DH      DK      DL
  ORIGIN

  0  0  0     14.4    1.0   1289.1   1262.5  0.0131 -0.0063  0.8065    0.02    0.03    0.06
  0  0  1     31.4    0.7   1280.3   1266.3  0.0082 -0.0042  0.8065    0.16    0.05    0.37
  1  0  3     42.8    0.9   1242.6   1267.8 -0.0127 -0.0034  0.8065    0.10    0.22    0.15
  1  0  2     61.3    0.7   1251.3   1264.0 -0.0078 -0.0055  0.8065    0.22    0.23    0.34
  1  0  4     74.1    1.3   1233.8   1271.6 -0.0176 -0.0013  0.8064    0.14    0.22    0.40
  0  0  2     99.9    0.3   1271.5   1270.1  0.0034 -0.0021  0.8066    0.32    0.10    0.73
  1  0  1    131.4    0.6   1260.2   1260.2 -0.0030 -0.0076  0.8065    0.38    0.23    0.70
 ...

In this case, the "QUALITY" of the default indexing (first line, starting with 0 0 0) is surprisingly bad (1.0 would be a good value), and the angular distance between the given and refined beam direction is 1.0°, which is actually a lot. Nevertheless, the "mean absolute difference between observed and fitted indices" DH, DK, DL are the smallest in the list, which gives us some confidence in the indexing.

Since we did not specify space group and cell parameters, XDS integrates the data in P1 (the default), and we postpone space group determination to the scaling and merging (CORRECT) step (below).

Digression: a computational investigation into the robustness of the indexing

This indexing was based on 206252 spots found (by COLSPOT) in the first half of the DATA_RANGE, i.e. using SPOT_RANGE=1 900. If we had taken e.g. a single frame only (SPOT_RANGE=1 1; we do not have to re-run COLSPOT to try this - just use JOB=IDXREF) then we would have indexed based on 392 spot positions, and the last table would be:

  INDEX_   QUALITY  DELTA    XD       YD       X       Y       Z       DH      DK      DL
  ORIGIN

  0  0  0     13.2    0.6   1279.6   1268.0  0.0078 -0.0033  0.8065    0.15    0.07    0.26
  0  0  1     13.6    1.0   1288.4   1262.8  0.0127 -0.0061  0.8065    0.03    0.03    0.07
  0 -1 -1     15.3    1.0   1250.7   1251.8 -0.0082 -0.0122  0.8064    0.04    0.04    0.07
  1  0 -2     20.9    1.0   1255.5   1296.5 -0.0055  0.0126  0.8065    0.16    0.07    0.26
  0 -1  0     21.5    1.1   1259.6   1246.8 -0.0033 -0.0150  0.8064    0.12    0.06    0.21
  1  0 -3     23.1    1.3   1246.6   1301.8 -0.0105  0.0155  0.8064    0.03    0.03    0.07
  0  0 -1     28.3    0.2   1270.7   1273.2  0.0029 -0.0004  0.8066    0.31    0.14    0.53
  ...

Here, the second line (starting with 0 0 1) has low values of DH, DK, DL (deviation of indices from integer values), indicating a better ORGX ORGY (namely XD=1288.4 YD=1262.8) than the one provided by the header (1265.47 1273.87) which results in XD=1279.6 YD=1268.0 (first line, starting with 0 0 0), and is going to lead to mis-indexing. This mis-indexing actually happens if one does not correct ORGX and ORGY when indexing based on a single frame only: if not corrected, in CORRECT.LP the ISa is 1.03 (instead of >40), and the R-values are around 100%.

We may experiment with increasing SPOT_RANGEs to see how many frames are required to make IDXREF find the correct ORGX ORGY. The result is that with SPOT_RANGE up to 1 500 the correct indexing is not found automatically, whereas with SPOT_RANGE=1 600 and higher the correct indexing is identified. It is therefore a good idea to use a large SPOT_RANGE to give IDXREF more information about the lattice, and to increase the chances of automatic identification of the correct indexing.

It should be noted that even with a single frame used for indexing, the table gives a very clear indication that the provided ORGX ORGY is incorrect, and better values are suggested - but the user has to read and understand the table, and act accordingly!

Integration results, first pass

While we were inspecting IDXREF.LP, the program has been busy processing the frames. We change to the INTEGRATE tab.


Particularly noteworthy are the following findings:

  1. the distance decreases almost monotonically, so using the DISTANCE parameter for the REFINE(INTEGRATE) keyword is meaningful - it probably indicates that radiation damage increases the cell constants somewhat, which is compensated by the distance refinement.
  2. the missetting angles (bottom plot) oscillate with an amplitude of up to 0.5°, so the orientation matrix or other aspects of the geometric description are inaccurate.

Checking the agreement of actual and predicted reflections

After INTEGRATE, we man move to the Frame tab and load FRAME.cbf:

 

It is interesting to see that XDSGUI (which uses only the values from XDS.INP) places the low-resolution circle (green) around the header-specified ORGX, ORGY. The white circular area, however, indicates where XDS positioned the low-resolution exclusion after geometric refinement in IDXREF and INTEGRATE. In this case of particularly wrong header values, the two circels disagree strongly. If we would update XDS.INP with corrected ORGX, ORGY values, the circles would agree.

The most important finding is, however, that the actual and predicted reflections match nicely. We may zoom in and inspect individual reflections. This will reveal that at large 2theta, the integration areas are elongated - this illustrates XDS' internal geometric transformations that lead to undistorted reflections profiles, which is a requirement for high-accuracy profile fitting.

 

CORRECT: scaling and merging results (first pass)

We move to the CORRECT tab and inspect the left-hand text. It may be helpful to use ctrl-- (control-minus) to lower the font size!

The program reports about the point lattices it tried:

SPACE-GROUP         UNIT CELL CONSTANTS            UNIQUE   Rmeas  COMPARED  LATTICE-
  NUMBER      a      b      c   alpha beta gamma                            CHARACTER

      5      81.8   81.8  150.0  90.0  90.0  90.0    3802     7.0    22754    10 mC
     75      57.8   57.8  150.0  90.0  90.0  90.0    1896    11.1    24660    11 tP
     89      57.8   57.8  150.0  90.0  90.0  90.0    1125    11.1    25431    11 tP
  *  21      81.8   81.8  150.0  90.0  90.0  90.0    2032     7.9    24524    13 oC
      5      81.8   81.8  150.0  90.0  90.0  90.0    3802     7.0    22754    14 mC
      1      57.8   57.8  150.0  90.0  90.0  90.0    6919     4.4    19637    31 aP
     16      57.8   57.8  150.0  90.0  90.0  90.0    2095    11.1    24461    32 oP
      3      57.8   57.8  150.0  90.0  90.0  90.0    3850     8.5    22706    35 mP
      3      57.8  150.0   57.8  90.0  90.0  90.0    3749     7.8    22807    34 mP
      1      57.8   57.8  150.0  90.0  90.0  90.0    6919     4.4    19637    44 aP
 

************ SELECTED SPACE GROUP AND UNIT CELL FOR THIS DATA SET ************

SPACE_GROUP_NUMBER=   21
UNIT_CELL_CONSTANTS=    81.77    81.78   149.99  90.000  90.000  90.000

The true space group (92) was unfortunately not identified by the simple heuristic that XDS uses. This is because the Rmeas in the resolution range 10-5Å is 11.1% for the tetragonal lattice (space group 89), and this is more than 2 times (MAX_FAC_RMEAS) higher than the Rmeas in the triclinic lattice, 4.4%. The next highest symmetry is space group 21, and since its Rmeas is only 7.9%, it is chosen. If we go to the TOOLS menu / "Further analysis" / "determine spacegroup with pointless" we see in the console window (!) that pointless suggests either space group 92 (correct!) or 96 (enantiomorph of 92; equivalent at this stage). We thus have to teach XDS to use 92, by transferring this information into XDS.INP: we change

JOB= XYCORR INIT COLSPOT DEFPIX INTEGRATE CORRECT
...
SPACE_GROUP_NUMBER=0
UNIT_CELL_CONSTANTS=70 80 90 90 90 90

to

JOB= CORRECT
SPACE_GROUP_NUMBER=92
UNIT_CELL_CONSTANTS=57.8   57.8  150.0  90.0  90.0  90.0

press "Save" and "Run XDS".

This forces the correct space group, and the plots in the CORRECT tab then look like this:


So the data are quite good to the edge of the detector. In principle we could have processed into the corners of the detector; this would have required

TRUSTED_REGION=0 1.42

but then the completeness would drop to zero at the highest resolution.

Further statistics from XDSSTAT

After moving to the XDSSTAT tab, we click "run xdsstat" and obtain   (the upper two panels are left out here).

We may also look at 2-dimensional representations of indicators mapped onto the detector surface, by using the "view" pull-down menu. However, none of them looks "interesting" enough to be reproduced here!

What is interesting, however, is

  1. the oscillating behaviour of the CORR plot (upper panel)
  2. the slight rise of R_d (radiation damage indicator), from about 4% (left side) to about 6% (right side). This corresponds to a 50% increase in R_meas and indicates that the influence of radiation damage at the end of data collection is about as high as the overall level of all other errors. The argumentation goes like this: two uncorrelated sources of error, that both contribute to R_meas, have to be added by adding their variances. If their variances are equal this means that the total variance is doubled. Concerning R_meas, this means a sqrt(2)=1.4142-fold increase. Here we observe roughly 1.5-fold increase - even slightly more than sqrt(2). To get a more accurate idea of the amount of radiation damage relative to the other sources of error, one could fit a weighted least-squares line to the R_d plot, and base the calculation on that.

Improved processing (round 2)

The TOOLS menu of XDSGUI / "Saving and comparing good results" offers "backup files to ./save". We do this now - it is useful to go back to the current state in case any changes mess something up, or just for comparing files - there is a button "compare CORRECT.LP with previous best". Please note that by default this uses the "xdiff" binary. If on your system this graphical tool is not installed, maybe the "xxdiff" or "tkdiff" tool is installed, or you could have one of the two installed? In that case, just change the line below the button accordingly.

Next, we click "Optimizing data quality" / "copy latest geometry description over previous one". As can be seen in the line below (which may be edited!), this just overwrites the previous XPARM.XDS with the GXPARM.XDS that CORRECT has written: it has the space group information, and refined crystal setting and beam and spindle directions.

Next, we go back to the XDS.INP tab and change

JOB= DEFPIX INTEGRATE CORRECT ! DEFPIX is not really necessary but it doesn't hurt either

and then klick "Save" and "Run XDS". We may move to the INTEGRATE tab and see the plots evolving. When INTEGRATE is done, we may move to the CORRECT tab and see the plots. In this case they are so similar to the previous ones that they are not reproduce here. However, the statistics in the highest shells are slightly better (check for yourself!). What has also improved, is the CORR plot from XDSSTAT:

 

It no longer oscillates, and the level is slightly higher. This shows that it is useful to

mv GXPARM.XDS XPARM.XDS

as an optimization step.

Solving the structure

We run hkl2map and load XDS_ASCII.HKL - SHELXC understands the XDS format and picks up cell and space group.

 

 

Next, we run SHELXC and find that the anomalous CC1/2 extends to the high resolution edge.

 

Nevertheless, we accept the defaults (because the data are so good ...), let the GUI calculate the solvent content from the number of residues (207)

 

 

... and run SHELXD, searching for 15 sites. This is just an estimate; based on the "Site occupancy versus Peak Number" plot there are really 16 strong (sulfur) sites

 

At this point, the structure can be considered solved: SHELXE built 197 residues, and it would be really easy to finish this up.