R-factors: Difference between revisions
Line 83: | Line 83: | ||
* (R<sub>sym</sub> / R<sub>merge</sub> ) should not be used to judge [[data quality]], R<sub>meas</sub> should be used instead. The reason is that the former depend on multiplicity, whereas the latter doesn't. | * (R<sub>sym</sub> / R<sub>merge</sub> ) should not be used to judge [[data quality]], R<sub>meas</sub> should be used instead. The reason is that the former depend on multiplicity, whereas the latter doesn't. | ||
* R/R<sub>free</sub> and NCS: reflections in work and test set are not independent if chosen randomly. It is better to choose the test set reflections in thin resolution shells (FIXME: references and programs for this). | * R/R<sub>free</sub> and NCS: reflections in work and test set are not independent if chosen randomly. It is better to choose the test set reflections in thin resolution shells (FIXME: references and programs for this). A paper investigating this thoroughly is Fabiola, F., A. Korostelev, et al. (2006). "Bias in cross-validated free R | ||
factors: mitigation of the effects of non-crystallographic symmetry." Acta | |||
Crystallogr D Biol Crystallogr 62(Pt 3): 227-38. | |||
* Sets of reflections used for calculating R<sub>free</sub> should be maintained throughout a project. This is nicely discussed at http://www.bmsc.washington.edu/people/merritt/xplor/rfree_example.html . | * Sets of reflections used for calculating R<sub>free</sub> should be maintained throughout a project. This is nicely discussed at http://www.bmsc.washington.edu/people/merritt/xplor/rfree_example.html . |
Revision as of 19:17, 18 January 2009
Definitions
Data quality indicators
In the following, all sums over hkl extend only over unique reflections with more than one observation!
- Rsym and Rmerge - the formula for both is:
[math]\displaystyle{
R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
where [math]\displaystyle{ \langle I_{hkl}\rangle }[/math] is the average of symmetry- (or Friedel-) related observations of a unique reflection.
It can be shown that this formula results in higher R-factors when the redundancy is higher (Diederichs and Karplus [1]). In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
- Redundancy-independant version of the above:
[math]\displaystyle{
R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
which unfortunately results in higher (but more realistic) numerical values than Rsym / Rmerge (Diederichs and Karplus [1] , Weiss and Hilgenfeld [2]).
measuring quality of averaged intensities/amplitudes
for intensities use (Weiss [3])
[math]\displaystyle{
R_{p.i.m.} = \frac{\sum_{hkl} \sqrt \frac{1}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
}[/math]
Rmrgd-I is similarly defined in Diederichs and Karplus [1].
Similarly, one should use Rmrgd-F as a quality indicator for amplitudes [1], which may be calculated as:
[math]\displaystyle{
R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n-1} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
}[/math]
with [math]\displaystyle{ \langle F_{hkl}\rangle }[/math] defined analogously as [math]\displaystyle{ \langle I_{hkl}\rangle }[/math].
In the sums above, the summation omits those reflections with just one observation.
measuring radiation damage
We can plot (Diederichs [4])
[math]\displaystyle{ R_{d} = \frac{\sum_{hkl} \sum_{|i-j|=d} \vert I_{hkl,i} - I_{hkl,j}\vert}{\sum_{hkl} \sum_{|i-j|=d} (I_{hkl,i} + I_{hkl,j})/2} }[/math]
which gives us the average R-factor of two reflections measured d frames apart. As long as the plot is parallel to the x axis there is no radiation damage. As soon as the plot starts to rise, we see that there's a systematical error contribution due to radiation damage.
Strong wiggles at very high d are irrelevant as only few reflections contribute.
To my knowledge, the only program that implements this currently (December 2008) is XDSSTAT.
Model quality indicators
- R and Rfree : the formula for both is
[math]\displaystyle{
R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
}[/math]
where [math]\displaystyle{ F_{hkl}^{obs} }[/math] and [math]\displaystyle{ F_{hkl}^{calc} }[/math] have to be scaled w.r.t. each other. R and Rfree differ in the set of reflections they are calculated from: R is calculated for the working set, whereas Rfree is calculated for the test set.
what do R-factors try to measure, and how to interpret their values?
- relative deviation of
Data quality
- typical values: ...
Model quality
Relation between R and Rfree as a function of resolution
References:
- Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree Ratio. I. Derivation of Expected Values of Cross-Validation Residuals Used in Macromolecular Least-Squares Refinement. Acta Cryst. (1998). D54, 547-557 [5]
- Tickle IJ, Laskowski RA and Moss DS. Rfree and the Rfree ratio. II. Calculation of the expected values and variances of cross-validation statistics in macromolecular least-squares refinement. Acta Cryst. (2000). D56, 442-450 [6]
- GJ Kleywegt and TA Jones (2002). Homo Crystallographicus - Quo vadis? Structure 10, 465-472. (reprint from http://xray.bmc.uu.se/cgi-bin/gerard/reprint_mailer.pl?pref=65)
- formula from that paper: Rfree = 1.065*R + 0.036
- plot with empirical data: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif
- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000
- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html
what kinds of problems exist with these indicators?
- (Rsym / Rmerge ) should not be used to judge data quality, Rmeas should be used instead. The reason is that the former depend on multiplicity, whereas the latter doesn't.
- R/Rfree and NCS: reflections in work and test set are not independent if chosen randomly. It is better to choose the test set reflections in thin resolution shells (FIXME: references and programs for this). A paper investigating this thoroughly is Fabiola, F., A. Korostelev, et al. (2006). "Bias in cross-validated free R
factors: mitigation of the effects of non-crystallographic symmetry." Acta Crystallogr D Biol Crystallogr 62(Pt 3): 227-38.
- Sets of reflections used for calculating Rfree should be maintained throughout a project. This is nicely discussed at http://www.bmsc.washington.edu/people/merritt/xplor/rfree_example.html .
Notes
- ↑ 1.0 1.1 1.2 1.3 K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [1]
- ↑ M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[2]
- ↑ M.S. Weiss. Global indicators of X-ray data quality. J. Appl. Cryst. (2001). 34, 130-135 [3]
- ↑ K. Diederichs (2006). Some aspects of quantitative analysis and correction of radiation damage. Acta Cryst D62, 96-101 [4]