CC1/2: Difference between revisions
No edit summary |
No edit summary |
||
Line 22: | Line 22: | ||
== Implementation == | == Implementation == | ||
===''' <math>\sigma^2_{\epsilon} </math>''' | ===''' <math>\sigma^2_{\epsilon} </math>'''=== | ||
The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by: | The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by: | ||
<math>\sigma^2_{\epsilon i} = \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j} - \frac{\left ( \sum^n_{j}x_{j} \right )^2}{ n} \right ) / \frac{n}{2} </math> | <math>\sigma^2_{\epsilon i} = \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j,i} - \frac{\left ( \sum^n_{j}x_{j,i} \right )^2}{ n} \right ) / \frac{n}{2} </math> | ||
With <math>x_{j} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} </math> is then divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean (merged) intensity of the half-datasets, as defined in [https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]. These "variances of means" are averaged over all unique reflections of the resolution shell: | With <math>x_{j,i} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} </math> is then divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean (merged) intensity of the half-datasets, as defined in [https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]. These "variances of means" are averaged over all unique reflections of the resolution shell: | ||
<math>\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | <math>\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | ||
Line 39: | Line 39: | ||
The unbiased sample variance from all averaged intensities of all unique reflections is calculated by: | The unbiased sample variance from all averaged intensities of all unique reflections is calculated by: | ||
<math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}^2 - \frac{\left ( \sum^N_{i} \overline{x} \right )^2}{ N} \right ) </math> | <math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math> | ||
With <math>\overline{x}= \sum^n_{j} x_{j}</math> , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections N in a resolution shell. | With <math>\overline{x}_{i}= \sum^n_{j} x_{j,i}</math> , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections N in a resolution shell. | ||
Revision as of 09:08, 6 September 2018
number of reflection pairs
CORRECT.LP and XSCALE.LP do not explicitly state the number of reflection pairs that were used to calculated CC1/2.
However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC1/2 and the R-value calculations.
The number of reflections pairs that were used for the CC1/2 calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were (X-Y+Z) reflection pairs that went into CC1/2.
why CC1/2 can be negative
There is a mathematical reason, explained in §4.1 of Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.
CC1/2 calculation
CC1/2 is calculated by:
- [math]\displaystyle{ CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} }[/math]
This requires calculation of [math]\displaystyle{ \sigma^2_{y} }[/math], the variance of the average intensities across the unique reflections of a resolution shell, and [math]\displaystyle{ \sigma^2_{\epsilon} }[/math], the average of all sample variances of the mean across all unique reflections of a resolution shell.
Implementation
[math]\displaystyle{ \sigma^2_{\epsilon} }[/math]
The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:
[math]\displaystyle{ \sigma^2_{\epsilon i} = \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j,i} - \frac{\left ( \sum^n_{j}x_{j,i} \right )^2}{ n} \right ) / \frac{n}{2} }[/math]
With [math]\displaystyle{ x_{j,i} }[/math] , a single observation j of all observations n of one reflection i. [math]\displaystyle{ \sigma^2_{\epsilon i} }[/math] is then divided by the factor [math]\displaystyle{ \frac{n}{2} }[/math], because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean (merged) intensity of the half-datasets, as defined in [1]. These "variances of means" are averaged over all unique reflections of the resolution shell:
[math]\displaystyle{ \sum^N_{i} \sigma^2_{\epsilon i} / N }[/math]
[math]\displaystyle{ \sigma^2_{y} }[/math]
The unbiased sample variance from all averaged intensities of all unique reflections is calculated by:
[math]\displaystyle{ \sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) }[/math]
With [math]\displaystyle{ \overline{x}_{i}= \sum^n_{j} x_{j,i} }[/math] , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections N in a resolution shell.
Example
An example is shown for a very simplified data file (unmerged ASCII.HKL). Only two frames/crystals are looked at and the diffraction pattern also consists only of two unique reflections with each three observations for every unique reflection.
First reflection with 6 observations: h k l int σ(int) #datset 2 0 0 9.156E+02 3.686E+00 1 0 2 0 5.584E+02 3.093E+00 1 0 0 2 6.301E+02 2.405E+01 1 2 0 0 9.256E+02 3.686E+00 2 0 2 0 2.584E+02 3.093E+00 2 0 0 2 7.301E+02 2.405E+01 2
[math]\displaystyle{ x_{i} }[/math] , the average intensity of all observations from all frames/crystals of this reflection = 669.6999
[math]\displaystyle{ \sigma^2_{\epsilon i} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection i = 20848.2198 (62544.6597/(n/2))
Second reflection with 6 observations: h k l int σ(int) #datset 1 1 2 2.395E+01 8.932E+01 1 1 2 1 9.065E+01 7.407E+00 1 2 1 1 5.981E+01 9.125E+00 1 1 1 2 3.395E+01 8.932E+01 2 1 2 1 9.065E+01 7.407E+00 2 2 1 1 1.608E+01 2.215E+01 2
[math]\displaystyle{ x_{i} }[/math] , the average intensity of all observations from all frames/crystals of this reflection = 52.5150
[math]\displaystyle{ \sigma^2_{\epsilon i} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection i = 363.3267 (1089.9803/(n/2))
[math]\displaystyle{ \sigma^2_{\epsilon} }[/math] , the average of all the [math]\displaystyle{ \sigma^2_{\epsilon i} }[/math] = 10605.7733
[math]\displaystyle{ \sigma^2_{y} }[/math], the variance of all the averaged intensities = 190458.6533
As a result of these calculations CC1/2 =.9458 ((190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733))