2,669
edits
(→Example: fix )) |
m (→See also: fix link) |
||
(25 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
==CC<sub>1/2</sub> calculation== | ==CC<sub>1/2</sub> calculation== | ||
CC<sub>1/2</sub> can be calculated with the so-called σ-τ method ([https:// | CC<sub>1/2</sub> can be calculated with the so-called σ-τ method ([https://doi.org/10.1107/S1600576716005471 Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography. J. Appl. Cryst. 49, 1021-1028.]) by: | ||
: <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math> | : <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math> | ||
This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities | This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities, and <math>\sigma^2_{\epsilon} </math>, the average of the variances of the averaged (merged) intensities. | ||
== Method == | == Method == | ||
===''' <math>\sigma^2_{\epsilon} </math>'''=== | ===calculating ''' <math>\sigma^2_{\epsilon} </math>'''=== | ||
With <math>x_{j,i} </math> , a single observation j of all observations | With <math>x_{j,i} </math> , a single observation j of all n observations of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by: | ||
<math> | <math>s^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} </math> | ||
<math> | <math>s^2_{\epsilon i} </math> is divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by '''n/2''' takes care of providing the variance of the mean ([https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]) (merged) intensity of the '''half'''-datasets, as defined in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. These "variances of means" are averaged over all unique reflections of the resolution shell: | ||
<math> | <math>s^2_{\epsilon}=\sum^N_{i} s^2_{\epsilon i} / N </math> | ||
Line 24: | Line 24: | ||
If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are considered as weights for the CC<sub>1/2</sub> calculation, with <math>w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} </math> the unbiased '''weighted''' sample variance of the mean for every unique reflection i is | If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are considered as weights for the CC<sub>1/2</sub> calculation, with <math>w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} </math>, one way to obtain the unbiased '''weighted''' sample variance of the half-dataset mean for every unique reflection i is: | ||
<math> | <math>s^2_{\epsilon i\_w} = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right ) / \frac{n_{i}}{2} </math> | ||
These " weighted variances of means" are averaged over all unique reflections of the resolution shell: | These " weighted variances of means" are averaged over all unique reflections of the resolution shell: | ||
<math> | <math>s^2_{\epsilon_w}=\sum^N_{i} s^2_{\epsilon i\_w} / N </math> | ||
It should be noted that it is not straightforward to define the correct way to calculate a weighted variance (and the weighted variance of the mean). The formula <math>s^2_w = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right )</math> is - after some manipulation - the same as that found at [https://stats.stackexchange.com/questions/6534/how-do-i-calculate-a-weighted-standard-deviation-in-excel],[https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf]. Other ways of calculating the weighted variance of the mean ([https://en.wikipedia.org/wiki/Weighted_arithmetic_mean],[https://www.gnu.org/software/gsl/manual/html_node/Weighted-Samples.html]) should be considered. | |||
---- | ---- | ||
===''' <math>\sigma^2_{y} </math>'''=== | ===calculating ''' <math>\sigma^2_{y} </math>'''=== | ||
Let N be the number of reflections in a resolution shell. With <math>\overline{x}_{i}= \sum^n_{j} x_{j,i}</math> , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by: | Let N be the number of reflections in a resolution shell. With <math>\overline{x}_{i}= \frac{1} {n} \sum^n_{j} x_{j,i}</math> , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by: | ||
<math> | <math>s^2_{y} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math> | ||
---- | |||
If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are used for weighting, <math>s^2_{y}</math> is obtained from | |||
<math>\overline{x}_{i_w} = \frac{\sum^n_{j} w_{j,i} x_{j,i}} {\sum^n_{j}w_{j,i}}</math>: | |||
<math>s^2_{y_w} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i_w}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i_w} \right )^2}{ N} \right ) </math> | |||
== Example == | == Example == | ||
Line 94: | Line 99: | ||
sumibar=sumi+xbar | sumibar=sumi+xbar | ||
sumibar2=sumibar2+xbar**2 | sumibar2=sumibar2+xbar**2 | ||
sumsig2eps= | sumsig2eps=sumsig2eps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2) | ||
END DO | END DO | ||
sig2y=(sumibar2-sumibar**2/nref)/(nref-1) | sig2y=(sumibar2-sumibar**2/nref)/(nref-1) | ||
Line 111: | Line 116: | ||
== why CC<sub>1/2</sub> can be negative == | == why CC<sub>1/2</sub> can be negative == | ||
If the numerator of the formula becomes negative, CC<sub>1/2</sub> is negative. This happens if the variance of the average intensities across the unique reflections of a resolution shell is low, but the individual measurements of each unique reflection vary strongly. This is discussed in §4.1 of [https://doi.org/10.1107/S1600576716005471 Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography. J. Appl. Cryst. 49, 1021-1028.] | |||
== Implementation == | |||
This way of calculating CC<sub>1/2</sub> is implemented in [[XDSCC12]] and in [http://www.desy.de/~twhite/crystfel/manual-partialator.html partialator] of [http://www.desy.de/~twhite/crystfel/relnotes-0.8.0 CrystFEL]. | |||
== See also == | |||
[https://www.youtube.com/watch?v=LirxJIcQ6T0 CC* - Linking crystallographic model and data quality.] Video recorded at SBGrid/NE-CAT workshop 2014; the PDF is [https://wiki.uni-konstanz.de/pub/CC%20-%20Linking%20crystallographic%20model%20and%20data%20quality.pdf here]. The sound is poor. |