|
|
Line 1: |
Line 1: |
| ΔCC12 is a quantity, that detects datasets/frames, that are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], Δcc12 is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values, but is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method CC12 is calculated for all datasets/frames, which will be called CC12_overall (?) and CC12 is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC12_i. The difference of the two quantities is Δcc12.
| | ΔCC<sub>1/2</sub> is a quantity, that detects datasets/frames, that are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], ΔCC<sub>1/2</sub> is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values, but is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method CC<sub>1/2</sub> is calculated for all datasets/frames, which will be called CC<sub>1/2_overall</sub> and CC<sub>1/2</sub> is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC<sub>1/2_i</sub>. The difference of the two quantities is ΔCC<sub>1/2</sub>. |
| | |
| : <math>\Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2 i} </math> | | : <math>\Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2 i} </math> |
|
| |
|
| If ΔCC12 is > 0 -CC12overall is bigger than CC12i- that means if omitting dataset i from calculations, a lower CC12 results, which is why we want to keep it. Thus it is improving the whole merged dataset. If ΔCC12 is < 0, -CC12overall is smaller than CC12i- that means that by omitting dataset i from calculations a higher CC12 results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset. CC12 is calculated by: | | If ΔCC<sub>1/2_</sub> is > 0 -CC<sub>1/2_overall</sub> is bigger than CC<sub>1/2_i</sub>- that means if omitting dataset i from calculations, a lower CC<sub>1/2</sub> results, which is why we want to keep it. Thus it is improving the whole merged dataset. If Δ CC<sub>1/2</sub> is < 0, -CC<sub>1/2_overall</sub> is smaller than CC<sub>1/2_i</sub> - that means that by omitting dataset i from calculations a higher CC<sub>1/2</sub> results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset. |
| | |
| : <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math>
| |
| | |
| This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the mean across all unique reflections of a resolution shell.
| |
| | |
| == Implementation ==
| |
| | |
| ===''' <math>\sigma^2_{\epsilon} </math>''' - unweighted===
| |
| | |
| The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:
| |
| | |
| <math>\sigma^2_{\epsilon i} = \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j} - \frac{\left ( \sum^n_{j}x_{j} \right )^2}{ n} \right ) / \frac{n}{2} </math> | |
| | |
| With <math>x_{j} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} </math> is then divided by the factor <math>\frac{n}{2} </math>, because the variance of the sample mean (the merged observations) is the quantity of interest. The division by n/2 takes care of providing the variance of the mean (merged) intensity of the half-datasets, as defined in [https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ]. These "variances of means" are averaged over all unique reflections of the resolution shell:
| |
| | |
| <math>\sum^N_{i} \sigma^2_{\epsilon i} / N </math>
| |
| | |
| | |
| ----
| |
| | |
| ===''' <math>\sigma^2_{y} </math>'''===
| |
| | |
| The unbiased sample variance from all averaged intensities of all unique reflections is calculated by:
| |
| | |
| <math>\sigma^2_{y} = \frac{1}{N-1} \cdot \left ( \sum^N_{i} \overline{x}^2 - \frac{\left ( \sum^N_{i} \overline{x} \right )^2}{ N} \right ) </math>
| |
| | |
| With <math>\overline{x}= \sum^n_{j} x_{j}</math> , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections N in a resolution shell.
| |
| | |
| | |
| == Example ==
| |
| An example is shown for a very simplified data file (unmerged ASCII.HKL). Only two frames/crystals are looked at and the diffraction pattern also consists only of two unique reflections with each three observations for every unique reflection.
| |
| | |
| <pre> | |
| First reflection with 6 observations:
| |
| h k l int σ(int) #datset
| |
| 2 0 0 9.156E+02 3.686E+00 1
| |
| 0 2 0 5.584E+02 3.093E+00 1
| |
| 0 0 2 6.301E+02 2.405E+01 1
| |
| 2 0 0 9.256E+02 3.686E+00 2
| |
| 0 2 0 2.584E+02 3.093E+00 2
| |
| 0 0 2 7.301E+02 2.405E+01 2
| |
| </pre>
| |
| <math>x_{i} </math> , the average intensity of all observations from all frames/crystals of this reflection = 669.6999
| |
| | |
| <math>\sigma^2_{\epsilon i} </math>, the unbiased sample variance of the mean of all observations of this unique reflection i = 20848.2198 (62544.6597/(n/2))
| |
| | |
|
| |
| <pre>
| |
| Second reflection with 6 observations:
| |
| h k l int σ(int) #datset
| |
| 1 1 2 2.395E+01 8.932E+01 1
| |
| 1 2 1 9.065E+01 7.407E+00 1
| |
| 2 1 1 5.981E+01 9.125E+00 1
| |
| 1 1 2 3.395E+01 8.932E+01 2
| |
| 1 2 1 9.065E+01 7.407E+00 2
| |
| 2 1 1 1.608E+01 2.215E+01 2
| |
| </pre> | |
| <math>x_{i} </math> , the average intensity of all observations from all frames/crystals of this reflection = 52.5150
| |
| | |
| <math>\sigma^2_{\epsilon i} </math>, the unbiased sample variance of the mean of all observations of this unique reflection i = 363.3267 (1089.9803/(n/2))
| |
| | |
| | |
| <math>\sigma^2_{\epsilon} </math> , the average of all the <math>\sigma^2_{\epsilon i} </math> = 10605.7733
| |
| | |
| <math>\sigma^2_{y} </math>, the variance of all the averaged intensities = 190458.6533
| |
| | |
| As a result of these calculations CC12 =
| |
|
| |
|
|
| |
|
| == Program == | | == Program == |
ΔCC1/2 is a quantity, that detects datasets/frames, that are non-isomorphous. As described in Assmann and Diederichs (2016), ΔCC1/2 is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values, but is not influenced by a random number sequence as shown in Karplus and Diederichs (2012). For the σ-τ method CC1/2 is calculated for all datasets/frames, which will be called CC1/2_overall and CC1/2 is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC1/2_i. The difference of the two quantities is ΔCC1/2.
- [math]\displaystyle{ \Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2 i} }[/math]
If ΔCC1/2_ is > 0 -CC1/2_overall is bigger than CC1/2_i- that means if omitting dataset i from calculations, a lower CC1/2 results, which is why we want to keep it. Thus it is improving the whole merged dataset. If Δ CC1/2 is < 0, -CC1/2_overall is smaller than CC1/2_i - that means that by omitting dataset i from calculations a higher CC1/2 results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset.
Program