R-factors: Difference between revisions

← Older edit

R-factors (edit)

Revision as of 16:05, 26 January 2018

3,187 bytes added , 26 January 2018

→‎Definitions

Karsten

Bureaucrats

47

edits

@@ Line 3: / Line 3: @@
 In the following, all sums over hkl extend only over unique reflections with more than one observation!
 * R<sub>sym</sub> and R<sub>merge</sub> - the formula for both is:
- <math>
- R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
- </math>
+: <math>
-where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection.
+R = \frac{\sum_{hkl} \sum_{j} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+</math>
+where <math>\langle I_{hkl}\rangle</math> is the average of symmetry- (or Friedel-) related observations of a unique reflection. The formula is due to Arndt, U.W., Crowther, R.A. & Mallet, J.F.W. A computer-linked cathode ray tube microdensitometer for X-ray crystallography. J. Phys. E:Sci. Instr. 1, 510−516 (1968). Any unique reflection with n=2 or more observations enters the sums.
 It can be shown that this formula results in higher R-factors when the redundancy is higher (Diederichs and Karplus <ref name="DiKa97">K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]</ref>). In other words, low-redundancy datasets appear better than high-redundancy ones, which obviously violates the intention of having an indicator of data quality!
 * Redundancy-independant version of the above:
- <math>
- R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
- </math>
+: <math>
+R_{meas} = \frac{\sum_{hkl} \sqrt \frac{n}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+</math>
 which unfortunately results in higher (but more realistic) numerical values than R<sub>sym</sub> / R<sub>merge</sub>
 (Diederichs and Karplus <ref name="DiKa97"/> ,
 Weiss and Hilgenfeld <ref name="WeHi97">M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[http://dx.doi.org/10.1107/S0021889897003907]</ref>).
-==== measuring quality of averaged intensities/amplitudes ====
+==== measuring precision of averaged intensities/amplitudes ====
 for intensities use
 (Weiss <ref name="We01">M.S. Weiss. Global indicators of X-ray data quality. J. Appl. Cryst. (2001). 34, 130-135 [http://dx.doi.org/10.1107/S0021889800018227]</ref>)
- <math>
- R_{p.i.m.} = \frac{\sum_{hkl} \sqrt \frac{1}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
- </math>
-R<sub>mrgd-I</sub> is similarly defined in Diederichs and Karplus <ref name="DiKa97"/>.
+: <math>
+R_{p.i.m.} = \frac{\sum_{hkl} \sqrt \frac{1}{n-1} \sum_{j=1}^{n} \vert I_{hkl,j}-\langle I_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}I_{hkl,j}}
+</math>
+R<sub>mrgd-I</sub> (defined in Diederichs and Karplus <ref name="DiKa97"/>) only differs by a factor (FIXME: what is the factor? 0.5 or 1.4142 or ?) since it likewise takes the improvement in precision from multiplicity into account. R<sub>split</sub> , which is what the X-FEL community uses, is the same as R<sub>mrgd-I</sub> but that community seems not to be aware of this.
 Similarly, one should use R<sub>mrgd-F</sub> as a quality indicator for amplitudes <ref name="DiKa97"/>, which may be calculated as:
- <math>
+: <math>
   R_{mrgd-F} = \frac{\sum_{hkl} \sqrt \frac{1}{n-1} \sum_{j=1}^{n} \vert F_{hkl,j}-\langle F_{hkl}\rangle\vert}{\sum_{hkl} \sum_{j}F_{hkl,j}}
- </math>
+</math>
 with <math>\langle F_{hkl}\rangle</math> defined analogously as <math>\langle I_{hkl}\rangle</math>.
@@ Line 39: / Line 53: @@
 We can plot (Diederichs <ref name="Di06">K. Diederichs (2006). Some aspects of quantitative analysis and correction of radiation damage. Acta Cryst D62, 96-101 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/Diederichs_ActaD62_96.pdf]</ref>)
-<math>
- R_{d} = \frac{\sum_{hkl} \sum_{|i-j|=d} \vert I_{hkl,i} - I_{hkl,j}\vert}{\sum_{hkl} \sum_{|i-j|=d} (I_{hkl,i} + I_{hkl,j})/2}
+: <math>
+R_{d} = \frac{\sum_{hkl} \sum_{|i-j|=d} \vert I_{hkl,i} - I_{hkl,j}\vert}{\sum_{hkl} \sum_{|i-j|=d} (I_{hkl,i} + I_{hkl,j})/2}
 </math>
 which gives us the average R-factor of two reflections measured d frames apart. As long as the plot is parallel to the x axis there is no radiation damage. As soon as the plot starts to rise, we see that there's a systematical error contribution due to radiation damage.
@@ Line 48: / Line 64: @@
 To my knowledge, the only program that implements this currently (December 2008) is [[xds:XDSSTAT|XDSSTAT]].
+=== Comparing two sets of structure factor amplitudes or intensities ===
+The following is symmetric, and suitable for comparing two data sets, or two model amplitudes:
+: <math>
+R_{scale}=\frac{\sum_{hkl}\vert F_{hkl,i}-F_{hkl,j}\vert}{0.5\sum_{hkl} F_{hkl,i}+F_{hkl,j}}
+</math>
+for amplitudes, and analogously for intensities.
 === Model quality indicators ===
 * R and [[iucr:Free_R_factor|R<sub>free</sub>]] : the formula for both is
- <math>
- R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
- </math>
+: <math>
-<br>
+R=\frac{\sum_{hkl}\vert F_{hkl}^{obs}-F_{hkl}^{calc}\vert}{\sum_{hkl} F_{hkl}^{obs}}
-<br>
+</math>
 where <math>F_{hkl}^{obs}</math> and <math>F_{hkl}^{calc}</math> have to be scaled w.r.t. each other. R and R<sub>free</sub> differ in the set of reflections they are calculated from: R is calculated for the [[working set]], whereas R<sub>free</sub> is calculated for the [[test set]].
-== what do R-factors try to measure, and how to interpret their values? ==
-* relative deviation of
-=== Data quality ===
-* typical values: ...
-=== Model quality ===
+==== Relation between R and R<sub>free</sub> as a function of resolution ====
-==== Relation between R and R<sub>free</sub> as a function of resolution ====
+* The PDBe provide a service to plot many different statistical properties in the PDB against other properties. The link is http://www.ebi.ac.uk/pdbe-as/pdbestatistics/PDBeStatistics.jsp . You can see that there is an option of RDiff which is the difference between R and R-Free for all structures that contain both data.  Take a look at this first. There is a second parameter which you can set to resolution and this will allow you to draw a plot that you want. This will draw a 3D isometric plot which you can scale, and pick data points to view particular entries.
+* formula from Kleywegt and Jones (2002): R<sub>free</sub> = 1.065*R + 0.036
+* plot with empirical data: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif
+* many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000
+* harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html
+* When the resolution is plotted on a logarithmic scale, the most frequent values (modes) are practically linear functions allowing their easy interpolation / extrapolation as (Urzhumtsev et al, 2009)
+        mode(R) = 0.091*ln(resolution) + 0.134
+        mode(Rfree-R)   = 0.024*ln(resolution) + 0.020
 References:
@@ Line 72: / Line 104: @@
 * GJ Kleywegt and TA Jones (2002). Homo Crystallographicus - Quo vadis? Structure 10, 465-472. (reprint from http://xray.bmc.uu.se/cgi-bin/gerard/reprint_mailer.pl?pref=65)
-- formula from that paper: R<sub>free</sub> = 1.065*R + 0.036
-- plot with empirical data: http://xray.bmc.uu.se/gerard/supmat/rfree2000/rfminusr_vs_resolution.gif
+* Urzhumtsev, Afonine & Adams (2009) Acta Cryst., D65, 1283-1291.
-- many more plots: http://xray.bmc.uu.se/gerard/supmat/rfree2000
-- harry plotter (java): http://xray.bmc.uu.se/gerard/supmat/rfree2000/plotter.html
 == what kinds of problems exist with these indicators? ==
@@ Line 86: / Line 113: @@
 * Sets of reflections used for calculating R<sub>free</sub> should be maintained throughout a project. This is nicely discussed at http://www.bmsc.washington.edu/people/merritt/xplor/rfree_example.html . Note that none of the programs mentioned for selecting thin shells will allow you to extend the set of shells to higher resolution if you want to preserve your existing R-free set.
+* R-values and twinning: [http://www.ysbl.york.ac.uk/refmac/papers/Rfactor.pdf Garib N. Murshudov (2011) "Some properties of crystallographic reliability index - Rfactor: effect of twinning" Appl. Comput. Math., V.10, N.2, 2011, pp.250-261]. From the paper, the R-value table for random models is:
+      twinning  twinning not
+      modelled  modelled
+ twin   0.41      0.49
+ normal 0.52      0.58
+Another paper which investigates the properties of R-values in the presence of twinning is [http://journals.iucr.org/d/issues/2013/07/00/ba5190/index.html P. R. Evans and G. N. Murshudov (2013) "How good are my data and what is the resolution?" Acta Cryst. (2013). D69, 1204-1214]. As the title indicates, this paper discusses at what resolution the data should be cut. One important finding is that a perfect model gives an R value of 42.0% (for a perfect twin, 29.1%) against pure noise. This tells us that a model that gives significantly lower R<sub>free</sub> in the (current) high resolution shell may benefit from including higher resolution data.
+* R-values and [[pseudo-translation]]: if you have pseudotranslation you should be aware that if you solve the structure by molecular replacement, starting R factors could be 70-80%.
+* data R-values are not meaningful at high resolution. This is discussed by [http://strucbio.biologie.uni-konstanz.de//strucbio/files/karplus2012_science.pdf Karplus and Diederichs (2012) "Linking crystallographic data and model quality". ''Science'' '''336''', 1030]
 ==Notes==
 <references/>

R-factors: Difference between revisions

R-factors (edit)

Revision as of 16:05, 26 January 2018

Navigation menu

Search