Data quality: Difference between revisions

Revision as of 16:10, 9 May 2008

What is the resolution of my dataset?

First of all, it is limited by completeness. In practical terms this means that the highest resolution you can get is the resolution at the edge of the detector. If you collected enough frames, you may be able to squeeze out 0.1A if you process data all the way to the corner. Usually the detector is positioned close enough to the crystal so that you don't have any diffraction at the edge and then resolution limits should be chosen based on strength of the diffraction.

This limit is commonly based on average [math]\displaystyle{ I/\sigma }[/math]. Examples of such choices are:

[math]\displaystyle{ I/\sigma=1 }[/math] in the highest resolution shell
[math]\displaystyle{ I/\sigma=2 }[/math] in the highest resolution shell
at least 50% of reflections in the highest resolution shell have [math]\displaystyle{ I/\sigma }[/math] > 2
...

Some of these choices are more liberal than others (and so will result in higher resolution values). It is probably not worthwhile to argue which choice is the best, since it is indeed a matter of personal preference.

There is probably not much reason to limit resolution by R_merge. When the resolution limit is selected based on R_merge being less than a certain cutoff, the argument is that in higher resolution shells the variation among independent measurements of the intensity of the same reflection is too high. But such variation is indeed bound to be high for weak reflections. R_merge may and should be used as the measure of the overall data consistency (e.g. of two independent datasets the one that has higher R_merge probably is noisier).

Of course you can achieve lower R-factors in refinement by setting the resolution limit based on some cutoff value of R_merge. It is perfectly OK to aspire low R-factors, but to achieve this by throwing away good data isn't. The better strategy probably is to choose a generous high resolution limit early during structure solution, and to decide near the end of the refinement, by inspecting maps and comparing model R-factors at different resolutions, at which resolution the useful signal vanishes in the noise.

Improved indicators for data quality

R_merge is the wrong quantity to look at altogether, because

it depends on the multiplicity (unfortunately often called redundancy): the higher the multiplicity, the higher R_merge becomes
it assesses data consistency, not the quality of the reduced data

This has been discussed by Diederichs and Karplus^[1]), who suggest a multiplicity-independant version called R_meas, which unfortunately is not used by everyone because the formula gives higher values than R_merge. R-factors for data quality assessment were also suggested by Diederichs and Karplus, and Weiss and Hilgenfeld ^[2]. Weiss ^[3] showed that these R-factors are indeed strongly correlated with the quality of the data.

References

↑ K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [1]
↑ M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205 [2]
↑ M.S. Weiss (2001) Global indicators of X-ray data quality. J. Appl. Cryst. 34, 130-135 [3]

[DiKa97-1] K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [1]

[WeHi97-2] M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205 [2]

[We01-3] M.S. Weiss (2001) Global indicators of X-ray data quality. J. Appl. Cryst. 34, 130-135 [3]

[1]

[2]

[3]

@@ Line 5: / Line 5: @@
 This limit is commonly based on average <math>I/\sigma</math>.  Examples of such choices are:
+* <math>I/\sigma=1</math> in the highest resolution shell
+* <math>I/\sigma=2</math> in the highest resolution shell
+* at least 50% of reflections in the highest resolution shell have <math>I/\sigma</math> > 2
+* ...
-- <math>I/\sigma=1</math> in the highest resolution shell
+Some of these choices are more liberal than others (and so will result in higher resolution values).  It is probably not worthwhile to argue which choice is the best, since it is indeed a matter of personal preference.
-- <math>I/\sigma=2</math> in the highest resolution shell
+There is probably not much reason to limit resolution by R<sub>merge</sub>.  When the resolution limit is selected based on R<sub>merge</sub> being less than a certain cutoff, the argument is that in higher resolution shells the variation among independent measurements of the intensity of the same reflection is too high.  But such variation is indeed bound to be high for weak reflections.  R<sub>merge</sub> may and should be used as the measure of the overall data consistency (e.g. of two independent datasets the one that has higher R<sub>merge</sub> probably is noisier).
-- at least 50% of reflections in the highest resolution shell have <math>I/\sigma</math> > 2
+Of course you can achieve lower R-factors in refinement by setting the resolution limit based on some cutoff value of R<sub>merge</sub>. It is perfectly OK to aspire low R-factors, but to achieve this by throwing away good data isn't. The better strategy probably is to choose a generous high resolution limit early during structure solution, and to decide near the end of the refinement, by inspecting maps and comparing model R-factors at different resolutions, at which resolution the useful signal vanishes in the noise.
-...
+== Improved indicators for data quality ==
-Some of these choices are more liberal than others (and so will give you higher resolution).  It is probably not worthwhile to argue which choice is the best, since it is indeed a matter of personal preference.
+R<sub>merge</sub> is the wrong quantity to look at altogether, because
-There is not probably much reason to limit resolution by R<sub>merge</sub>.  When the resolution limit is selected based on R<sub>merge</sub> being less than certain cutoff, the argument is that in higher resolution shells the variation among independent measurements of the intensity of the same reflection is too high.  But such variation is bound to be high for weak reflections.  Plus, factors such as redundancy may significantly affect R<sub>merge</sub>.  R<sub>merge</sub> may and should be used as the measure of the overall data quality (e.g. of two independent datasets the one that has higher R<sub>merge</sub> probably is noisier).
-One thing you achieve by choosing resolution limit based on R<sub>merge</sub> (which generally means that your <math>I/\sigma</math> in the highest resolution shell will be >4), of course, is lower R-factors in refinement.  It is perfectly OK to aspire low R-factors, but to achieve this by throwing away data probably isn't.
-== R<sub>merge</sub> criticism ==
-Finally, R<sub>merge</sub> is the wrong quantitiy to look at altogether, because
 * it depends on the multiplicity (unfortunately often called redundancy): the higher the multiplicity, the higher R<sub>merge</sub> becomes
 * it assesses data consistency, not the quality of the reduced data
-This has been discussed by Diederichs and Karplus(<ref name="DiKa97">K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]</ref>), who suggest a multiplicity-independant version called R<sub>meas</sub>, which unfortunately is not used by everyone because the formula gives higher values than R<sub>merge</sub>. R-factors for data quality assessment were also suggested by Diederichs and Karplus, and Weiss and Hilgenfeld <ref name="WeHi97">M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205[http://dx.doi.org/10.1107/S0021889897003907]</ref>. Weiss <ref name="We01">M.S. Weiss. Global indicators of X-ray data quality. J. Appl. Cryst. (2001). 34, 130-135 [http://dx.doi.org/10.1107/S0021889800018227]</ref> showed that these R-factors are indeed strongly correlated with the quality of the data.
+This has been discussed by Diederichs and Karplus<ref name="DiKa97">K. Diederichs and P.A. Karplus (1997). Improved R-factors for diffraction data analysis in macromolecular crystallography. Nature Struct. Biol. 4, 269-275 [http://strucbio.biologie.uni-konstanz.de/strucbio/files/nsb-1997.pdf]</ref>), who suggest a multiplicity-independant version called R<sub>meas</sub>, which unfortunately is not used by everyone because the formula gives higher values than R<sub>merge</sub>. R-factors for data quality assessment were also suggested by Diederichs and Karplus, and Weiss and Hilgenfeld <ref name="WeHi97">M.S. Weiss and R. Hilgenfeld (1997) On the use of the merging R-factor as a quality indicator for X-ray data. J. Appl. Crystallogr. 30, 203-205 [http://dx.doi.org/10.1107/S0021889897003907]</ref>. Weiss <ref name="We01">M.S. Weiss (2001) Global indicators of X-ray data quality. J. Appl. Cryst. 34, 130-135 [http://dx.doi.org/10.1107/S0021889800018227]</ref> showed that these R-factors are indeed strongly correlated with the quality of the data.
 == References ==
 <references/>

Data quality: Difference between revisions

Revision as of 16:10, 9 May 2008

What is the resolution of my dataset?

Improved indicators for data quality

References

Navigation menu

Search