# CC1/2: Difference between revisions

No edit summary |
No edit summary |
||

Line 1: | Line 1: | ||

==CC<sub>1/2</sub> calculation== | ==CC<sub>1/2</sub> calculation== | ||

CC<sub>1/2</sub> | CC<sub>1/2</sub> can be calculated with the so-called σ-τ method ([https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.]) by: | ||

: <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math> | : <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math> | ||

This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the | This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the averaged (merged) intensities across all unique reflections of a resolution shell. | ||

== | == Method == | ||

===''' <math>\sigma^2_{\epsilon} </math>'''=== | ===''' <math>\sigma^2_{\epsilon} </math>'''=== | ||

With <math>x_{j,i} </math> , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by: | |||

<math>\sigma^2_{\epsilon i} = \frac{1}{ | <math>\sigma^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} </math> | ||

<math>\sigma^2_{\epsilon i} </math> is then divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by '''n/2''' takes care of providing the variance of the mean ([https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]) (merged) intensity of the '''half'''-datasets, as defined in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. These "variances of means" are averaged over all unique reflections of the resolution shell: | |||

<math>\sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | <math>\sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N </math> | ||

Line 37: | Line 25: | ||

===''' <math>\sigma^2_{y} </math>'''=== | ===''' <math>\sigma^2_{y} </math>'''=== | ||

Let N be the number of reflections. With <math>\overline{x}_{i}= \sum^n_{j} x_{j,i}</math> , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by: | |||

<math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math> | <math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) </math> | ||

Line 57: | Line 43: | ||

0 0 2 7.301E+02 2.405E+01 2 | 0 0 2 7.301E+02 2.405E+01 2 | ||

</pre> | </pre> | ||

<math>\overline{x}_{ | <math>\overline{x}_{1} </math> , the average intensity of all observations of this reflection = 669.6999 | ||

<math>\sigma^2_{\epsilon | <math>\sigma^2_{\epsilon 1} </math>, the unbiased sample variance of the mean of all observations of this unique reflection = 20848.2198 (62544.6597/(n/2)) | ||

Line 72: | Line 58: | ||

2 1 1 1.608E+01 2.215E+01 2 | 2 1 1 1.608E+01 2.215E+01 2 | ||

</pre> | </pre> | ||

<math>\overline{x}_{ | <math>\overline{x}_{2} </math> , the average intensity of all observations of this reflection = 52.5150 | ||

<math>\sigma^2_{\epsilon | <math>\sigma^2_{\epsilon 2} </math>, the unbiased sample variance of the mean of all observations of this unique reflection = 363.3267 (1089.9803/(n/2)) | ||

Line 83: | Line 69: | ||

As a result of these calculations CC<sub>1/2</sub> = 0.9458 ((190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733)) | As a result of these calculations CC<sub>1/2</sub> = 0.9458 ((190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733)) | ||

The described calculation is implemented in [[XDSCC12]] and CC<sub>1/2</sub> and ΔCC<sub>1/2</sub> | The described calculation is implemented in [[XDSCC12]], and CC<sub>1/2</sub> and [[DeltaCC12|ΔCC<sub>1/2</sub>]] can be calculated for XDS_ASCII.HKL and XSCALE.HKL files. | ||

==Fortran 95 code that assumes that all unique reflections have the same number of observations== | |||

<pre> | |||

sumibar=0 | |||

sumibar2=0 | |||

sumsig2eps=0 | |||

DO i=1,nref | |||

xbar=SUM(iobs(:,i))/nobs | |||

sumibar=sumi+xbar | |||

sumibar2=sumibar2+xbar**2 | |||

sumsig2eps=sumeps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2) | |||

END DO | |||

sig2y=(sumibar2-sumibar**2/nref)/(nref-1) | |||

sig2eps=sumsig2eps/nref | |||

print *,(sig2y-0.5*sig2eps)/(sig2y+0.5*sig2eps) | |||

</pre> | |||

== number of reflection pairs == | |||

[[CORRECT.LP]] and XSCALE.LP do not explicitly state the ''number of reflection pairs'' that were used to calculated CC<sub>1/2</sub>. | |||

However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC<sub>1/2</sub> and the R-value calculations. | |||

The ''number of reflections pairs'' that were used for the CC<sub>1/2</sub> calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were (X-Y+Z) reflection pairs that went into CC<sub>1/2</sub>. | |||

== why CC<sub>1/2</sub> can be negative == | |||

There is a mathematical reason, explained in §4.1 of [https://cms.uni-konstanz.de/index.php?eID=tx_nawsecuredl&u=0&g=0&t=1475179096&hash=5cf64234a23a794a1894c5408384c57208d7b602&file=fileadmin/biologie/ag-strucbio/pdfs/Assman2016_JApplCryst.pdf Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.] |

## Revision as of 12:00, 6 September 2018

## CC_{1/2} calculation

CC_{1/2} can be calculated with the so-called σ-τ method (Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.) by:

- [math]\displaystyle{ CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} }[/math]

This requires calculation of [math]\displaystyle{ \sigma^2_{y} }[/math], the variance of the average intensities across the unique reflections of a resolution shell, and [math]\displaystyle{ \sigma^2_{\epsilon} }[/math], the average of all sample variances of the averaged (merged) intensities across all unique reflections of a resolution shell.

## Method

** [math]\displaystyle{ \sigma^2_{\epsilon} }[/math]**

With [math]\displaystyle{ x_{j,i} }[/math] , a single observation j of all observations n of one reflection i, the average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:

[math]\displaystyle{ \sigma^2_{\epsilon i} = \frac{1}{n_{i}-1} \cdot \left ( \sum^{n_{i}}_{j} x^2_{j,i} - \frac{\left ( \sum^{n_{i}}_{j}x_{j,i} \right )^2}{n_{i}} \right ) / \frac{n_{i}}{2} }[/math]

[math]\displaystyle{ \sigma^2_{\epsilon i} }[/math] is then divided by the factor [math]\displaystyle{ \frac{n}{2} }[/math], because the variance of the sample mean (intensities of the merged observations) is the quantity of interest. The division by **n/2** takes care of providing the variance of the mean ([1]) (merged) intensity of the **half**-datasets, as defined in Karplus and Diederichs (2012). These "variances of means" are averaged over all unique reflections of the resolution shell:

[math]\displaystyle{ \sigma^2_{\epsilon}=\sum^N_{i} \sigma^2_{\epsilon i} / N }[/math]

** [math]\displaystyle{ \sigma^2_{y} }[/math]**

Let N be the number of reflections. With [math]\displaystyle{ \overline{x}_{i}= \sum^n_{j} x_{j,i} }[/math] , the unbiased sample variance from all averaged intensities of all unique reflections is calculated by:

[math]\displaystyle{ \sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}_{i}^2 - \frac{\left ( \sum^N_{i} \overline{x}_{i} \right )^2}{ N} \right ) }[/math]

## Example

An example is shown for a very simplified data file (unmerged ASCII.HKL).

First reflection with 6 observations: h k l int σ(int) 2 0 0 9.156E+02 3.686E+00 1 0 2 0 5.584E+02 3.093E+00 1 0 0 2 6.301E+02 2.405E+01 1 2 0 0 9.256E+02 3.686E+00 2 0 2 0 2.584E+02 3.093E+00 2 0 0 2 7.301E+02 2.405E+01 2

[math]\displaystyle{ \overline{x}_{1} }[/math] , the average intensity of all observations of this reflection = 669.6999

[math]\displaystyle{ \sigma^2_{\epsilon 1} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection = 20848.2198 (62544.6597/(n/2))

Second reflection with 6 observations: h k l int σ(int) 1 1 2 2.395E+01 8.932E+01 1 1 2 1 9.065E+01 7.407E+00 1 2 1 1 5.981E+01 9.125E+00 1 1 1 2 3.395E+01 8.932E+01 2 1 2 1 9.065E+01 7.407E+00 2 2 1 1 1.608E+01 2.215E+01 2

[math]\displaystyle{ \overline{x}_{2} }[/math] , the average intensity of all observations of this reflection = 52.5150

[math]\displaystyle{ \sigma^2_{\epsilon 2} }[/math], the unbiased sample variance of the mean of all observations of this unique reflection = 363.3267 (1089.9803/(n/2))

[math]\displaystyle{ \sigma^2_{\epsilon} }[/math] , the average of all the [math]\displaystyle{ \sigma^2_{\epsilon i} }[/math] = 10605.7733

[math]\displaystyle{ \sigma^2_{y} }[/math], the variance of all the averaged intensities = 190458.6533

As a result of these calculations CC_{1/2} = 0.9458 ((190458.6533-(0.5*10605.7733))/(190458.6533+(0.5*10605.7733))

The described calculation is implemented in XDSCC12, and CC_{1/2} and ΔCC_{1/2} can be calculated for XDS_ASCII.HKL and XSCALE.HKL files.

## Fortran 95 code that assumes that all unique reflections have the same number of observations

sumibar=0 sumibar2=0 sumsig2eps=0 DO i=1,nref xbar=SUM(iobs(:,i))/nobs sumibar=sumi+xbar sumibar2=sumibar2+xbar**2 sumsig2eps=sumeps + (SUM(iobs(:,i)**2)-xbar**2*nobs)/(nobs-1)/(nobs/2) END DO sig2y=(sumibar2-sumibar**2/nref)/(nref-1) sig2eps=sumsig2eps/nref print *,(sig2y-0.5*sig2eps)/(sig2y+0.5*sig2eps)

## number of reflection pairs

CORRECT.LP and XSCALE.LP do not explicitly state the *number of reflection pairs* that were used to calculated CC_{1/2}.

However, the number can be calculated from the numbers available, for each resolution shell: there is the NUMBER OF UNIQUE REFLECTIONS (X), the NUMBER OF OBSERVED REFLECTIONS (Y), and the number of COMPARED reflections (Z) - the latter number is the total number of unmerged observations that contributed to the CC_{1/2} and the R-value calculations.

The *number of reflections pairs* that were used for the CC_{1/2} calculation can therefore be obtained as follows: Y-Z gives the number of unique reflections that have a single observation. The remaining (X-Y+Z) unique reflections have multiple observations, i.e. there were (X-Y+Z) reflection pairs that went into CC_{1/2}.

## why CC_{1/2} can be negative

There is a mathematical reason, explained in §4.1 of Assmann, G., Brehm, W. and Diederichs, K. (2016) Identification of rogue datasets in serial crystallography (2016) J. Appl. Cryst. 49, 1021-1028.