Pathologies

How can we judge the quality of a data set? There are several possibilities:

  1. numerical indicators, like R-values, I/sigma and the like
  2. graphical representations

This article serves to demonstrate pathological cases. It collects examples for

  1. problems with the hardware (e.g. detector, beamline, goniostat, beam, cryo)
  2. problems with the crystal
  3. problems with data processing

Hardware problems

Scale factor plot in case of problems (beam or spindle)

 

The scale factor is printed, in INTEGRATE.LP, for every frame (column 3). This plot shows spikes indicating that the beam was weak, or the spindle went too fast every 13 frames or so (but in that case the spindle went a bit slower in the adjacent frames). No matter what the problem is due to, it is quite detrimental to data quality - the reflections which contribute to those frames that went too fast are underexposed.

Mosaicity plots in case of problems

 

The same data set: the mosaicity estimates of individual frames (column 10 in INTEGRATE.LP) is very much influenced by this. The "jumps" in the curve arises because INTEGRATE was run with MAXIMUM_NUMBER_OF_JOBS=8: since each of the 8 jobs uses the orientation matrix from IDXREF for its initial batch, and that matrix does not seem to match the actual orientation, the mosaicity appears high. Only after geometry refinement (green line) is the result reasonable (and thus the intensity estimates will not be affected). The estimate for the second batch of each job is much better, because it uses the orientation obtained from the geometry refinement as a starting point.

Exactly why the IDXREF estimate is off, and if it has something to do with the spindle problem, is unknown.

With MAXIMUM_NUMBER_OF_JOBS=1 the plot would definitely not look like this - it would be much smoother because the next batch of data "knows" about the orientation of the previous one.

 

The same hardware problem, but a different data set: here, the mosaicity estimates of individual frames are less affected, because the initial orientation from IDXREF is good. Oscillations are not seen very well here since the period of the scale factor changes is on the order of 13 frames.

 

Zoomed version of the above. The oscillations are better visible.