# CC1/2: Difference between revisions

(→\sigma^2_{\epsilon}: change nomenclature from sigma to s) |
|||

Line 14: | Line 14: | ||

With <math>x_{j,i} </math> , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by: | With <math>x_{j,i} </math> , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by: | ||

<math> | <math>s^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} </math> | ||

<math> | <math>s^2_{\epsilon i} </math> is divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by '''n/2''' takes care of providing the variance of the mean ([https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]) (merged) intensity of the '''half'''-datasets, as defined in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. These "variances of means" are averaged over all unique reflections of the resolution shell: | ||

<math>\sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | <math>\sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | ||

Line 26: | Line 26: | ||

If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are considered as weights for the CC<sub>1/2</sub> calculation, with <math>w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} </math> the unbiased '''weighted''' sample variance of the mean for every unique reflection i is obtained by: | If the standard deviations <math>\sigma_{int\_j,i} </math> for the single observations are considered as weights for the CC<sub>1/2</sub> calculation, with <math>w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} </math> the unbiased '''weighted''' sample variance of the mean for every unique reflection i is obtained by: | ||

<math> | <math>s^2_{\epsilon i\_w} = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right ) / \frac{n_{i}}{2} </math> | ||

These " weighted variances of means" are averaged over all unique reflections of the resolution shell: | These " weighted variances of means" are averaged over all unique reflections of the resolution shell: | ||

<math> | <math>s^2_{\epsilon_w}=\sum^N_{i} s^2_{\epsilon i\_w} / N </math> | ||

It should be noted that it is not straightforward to define the correct way to calculate a weighted variance (and the weighted variance of the mean). The formula <math>s^2_w = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right )</math> is - after some manipulation - the same as that found at [https://stats.stackexchange.com/questions/6534/how-do-i-calculate-a-weighted-standard-deviation-in-excel],[https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf]. | It should be noted that it is not straightforward to define the correct way to calculate a weighted variance (and the weighted variance of the mean). The formula <math>s^2_w = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right )</math> is - after some manipulation - the same as that found at [https://stats.stackexchange.com/questions/6534/how-do-i-calculate-a-weighted-standard-deviation-in-excel],[https://www.itl.nist.gov/div898/software/dataplot/refman2/ch2/weightsd.pdf]. |

## Revision as of 20:28, 13 April 2019

## CC_{1/2} calculation

CC_{1/2} can be calculated with the so-called σ-τ method (Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.) by:

- [math]\displaystyle{ CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} }[/math]

This requires calculation of [math]\displaystyle{ \sigma^2_{y} }[/math], the variance of the average intensities across the unique reflections of a resolution shell, and [math]\displaystyle{ \sigma^2_{\epsilon} }[/math], the average of all sample variances of the averaged (merged) intensities across all unique reflections of a resolution shell.

## Method

** [math]\displaystyle{ \sigma^2_{\epsilon} }[/math]**

With [math]\displaystyle{ x_{j,i} }[/math] , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the unbiased sample variance of the mean for every unique reflection i by:

[math]\displaystyle{ s^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} }[/math]

[math]\displaystyle{ s^2_{\epsilon i} }[/math] is divided by the factor [math]\displaystyle{ \frac{n}{2} }[/math], because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by **n/2** takes care of providing the variance of the mean ([1]) (merged) intensity of the **half**-datasets, as defined in Karplus and Diederichs (2012). These "variances of means" are averaged over all unique reflections of the resolution shell:

[math]\displaystyle{ \sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N }[/math]

If the standard deviations [math]\displaystyle{ \sigma_{int\_j,i} }[/math] for the single observations are considered as weights for the CC_{1/2} calculation, with [math]\displaystyle{ w_{j,i}=\frac{1}{\sigma_{int\_j,i}^2} }[/math] the unbiased **weighted** sample variance of the mean for every unique reflection i is obtained by:

[math]\displaystyle{ s^2_{\epsilon i\_w} = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right ) / \frac{n_{i}}{2} }[/math]

These " weighted variances of means" are averaged over all unique reflections of the resolution shell:

[math]\displaystyle{ s^2_{\epsilon_w}=\sum^N_{i} s^2_{\epsilon i\_w} / N }[/math]

It should be noted that it is not straightforward to define the correct way to calculate a weighted variance (and the weighted variance of the mean). The formula [math]\displaystyle{ s^2_w = \frac{n_{i}}{n_{i}-1} \cdot \left ( \frac{\sum^{n_{i}}_{j}w_{j,i} x^2_{j,i}}{\sum^{n_{i}}_{j}w_{j,i}} -\left ( \frac{ \sum^{n_{i}}_{j}w_{j,i}x_{j,i} }{\sum^{n_{i}}_{j}w_{j,i}}\right )^2 \right ) }[/math] is - after some manipulation - the same as that found at [2],[3].

** [math]\displaystyle{ \sigma^2_{y} }[/math]**

Let N be the number of reflections in a resolution shell. With [math]\displaystyle{ \overline{x}_{i}= \sum^n_{j} x_{j,i} }[/math] , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by:

[math]\displaystyle{ \sigma^2_{y} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) }[/math]

If the standard deviations [math]\displaystyle{ \sigma_{int\_j,i} }[/math] for the single observations are considered as weights, [math]\displaystyle{ \sigma^2_{y} }[/math] is obtained from [math]\displaystyle{ \overline{x}_{i_w} = \frac{\sum^n_{j} w_{j,i} x_{j,i}} {\sum^n_{j}w_{j,i}} }[/math]:

[math]\displaystyle{ \sigma^2_{y_w} = \frac{1}{N-1} \cdot \left (\sum^N_{i} \overline{x}_{i_w}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i_w} \right )^2}{ N} \right ) }[/math]

## Example

An example is shown for a very simplified data file (unmerged ASCII.HKL).

First reflection with 6 observations: h k l int σ(int) 2 0 0 9.156E+02 3.686E+00 1 0 2 0 5.584E+02 3.093E+00 1 0 0 2 6.301E+02 2.405E+01 1 2 0 0 9.256E+02 3.686E+00 2 0 2 0 2.584E+02 3.093E+00 2 0 0 2 7.301E+02 2.405E+01 2

[math]\displaystyle{ \overline{x}_{1} }[/math] , the average intensity of all observations of this reflection = 669.6999

[math]\displaystyle{ \sigma^2_{\epsilon 1} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection = 62544.6597/(n/2) = 20848.2198

Second reflection with 6 observations: h k l int σ(int) 1 1 2 2.395E+01 8.932E+01 1 1 2 1 9.065E+01 7.407E+00 1 2 1 1 5.981E+01 9.125E+00 1 1 1 2 3.395E+01 8.932E+01 2 1 2 1 9.065E+01 7.407E+00 2 2 1 1 1.608E+01 2.215E+01 2

[math]\displaystyle{ \overline{x}_{2} }[/math] , the average intensity of all observations of this reflection = 52.5150

[math]\displaystyle{ \sigma^2_{\epsilon 2} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection = 1089.9803/(n/2) = 363.3267 1089.9803/(n/2)

[math]\displaystyle{ \sigma^2_{\epsilon} }[/math] , the average of all the [math]\displaystyle{ \sigma^2_{\epsilon i} }[/math] = 10605.7733

[math]\displaystyle{ \sigma^2_{y} }[/math], the variance of all the averaged intensities = 190458.6533

As a result of these calculations CC_{1/2} = (190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733)), which results in 0.9458.

The described calculation is implemented in XDSCC12, and CC_{1/2} and ΔCC_{1/2} can be calculated for XDS_ASCII.HKL and XSCALE.HKL files.

## Fortran 95 code that assumes that all unique reflections have the same number of observations

sumibar=0 sumibar2=0 sumsig2eps=0 DO i=1,nref xbar=SUM(iobs(:,i))/nobs sumibar=sumi+xbar sumibar2=sumibar2+xbar**2 sumsig2eps=sumeps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2) END DO sig2y=(sumibar2-sumibar**2/nref)/(nref-1) sig2eps=sumsig2eps/nref print *,(sig2y-0.5*sig2eps)/(sig2y+0.5*sig2eps)

## number of reflection pairs

CORRECT.LP and XSCALE.LP do not explicitly state the *number of reflection pairs* that were used to calculated CC_{1/2}.

However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC_{1/2} and the R-value calculations.

The *number of reflections pairs* that were used for the CC_{1/2} calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were (X-Y+Z) reflection pairs that went into CC_{1/2}.

## why CC_{1/2} can be negative

If the numerator of the formula becomes negative, CC_{1/2} is negative. This happens if the variance of the average intensities across the unique reflections of a resolution shell is low, but the individual measurements of each unique reflection vary strongly. This is discussed in §4.1 of Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.

## Implementation

This way of calculating CC_{1/2} is implemented in XDSCC12 and in partialator of CrystFEL.