DeltaCC12: Difference between revisions

(15 intermediate revisions by the same user not shown)

Line 1:

~~Δcc12~~ is a quantity, that detects datasets/frames~~, that~~ are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], ~~Δcc12~~ is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values~~, but~~ is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method ~~CC12~~ is calculated for all datasets/frames, which will be called ~~CC12_overall (?)~~ and ~~CC12~~ is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as ~~CC12_i~~. The difference of the two quantities is ~~Δcc12.~~

ΔCC1/2 is a quantity that detects datasets/frames which are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], ΔCC1/2 is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values. The σ-τ method is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method CC1/2 is calculated for all datasets/frames, which will be called CC1/2_overall and CC1/2 is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC1/2_i. The difference of the two quantities is ΔCC1/2.

: <~~math~~>~~\Delta CC_{~~1/2~~}= CC_{1/2 overall}-CC_{1/2 i}~~ </~~math~~>

~~If Δcc12 is~~ > ~~0 -CC12overall is bigger than CC12i~~- ~~that means if omitting dataset i from calculations, a lower CC12 results, which is why we want to keep it. Thus it is improving the whole merged dataset. If Δcc12 is~~ < 0, -CC12overall is smaller than CC12i- that means that by omitting dataset i from calculations a higher CC12 results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset. CC12 is calculated by:

: <math>\Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2\_i} </math>

: <~~math~~>~~CC_{~~1/2~~}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{~~1}{2~~}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{~~1}{2~~}\sigma^2_{\epsilon}}~~ </~~math~~>

If ΔCC1/2 is > 0 (CC1/2_overall is bigger than CC1/2_i) it means that by omitting dataset i from calculations a lower CC1/2 results. As we want to maximize CC1/2 the dataset is kept for calculations, it is improving the whole merged dataset. If Δ CC1/2 is < 0 (CC1/2_overall is smaller than CC1/2_i) it means that by omitting dataset i from calculations a higher CC1/2 results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset.

This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the mean across all unique reflections of a resolution shell.

== ~~Implementation~~ ==

== Applications ==

~~===''' <math>\sigma^2_{y} </math>'''===~~

The ΔCC1/2 method is applicable for single frames, SSX data and SFX data. The program [[XDSCC12]] calculates ΔCC1/2 for the isomorphous and anomalous signal for XDS_ASCII.HKL and XSCALE.HKL files. Exact description of calculation and implementation are found at [[CC1/2]].

The ~~unbiased sample variance from all averaged intensities of all unique reflections is calculated by:~~

<~~math~~>~~\sigma^2_{y} = \frac{~~1~~}{n-1} \cdot \left ( \sum^n_{i} x^2_i - \frac{\left ( \sum^n_{i}x_{i} \right )^~~2~~}{ n} \right )~~ </~~math~~>

~~With <math>x_{i} </math> , average intensity of all observations from all frames/crystals of one unique reflection i. This~~ is ~~done~~ for ~~all reflections n in a resolution shell~~.

~~----~~

~~===''' <math>\sigma^2_{\epsilon} </math>''' - unweighted===~~

The ~~average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:~~

<~~math~~>~~\sigma^2_{\epsilon i} = \frac{1}{n-~~1~~} \cdot \left ( \sum^n_{j} x^2_{j} - \frac{\left ( \sum^n_{j}x_{j} \right )^2}{ n} \right ) \backslash \frac{n}{2} </math>~~

~~With <math>x_{j} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} <~~/~~math> is then divided by the factor <math>\frac{n}{~~2} </~~math~~>~~, because~~ the ~~variance of the sample mean (the merged observations) is the quantity of interest~~. ~~As we are considering CC12, the variance <math>\sigma^2_{\epsilon i} </math> is divided by <math>\frac{n}{2} </math>~~ and ~~not by '''n''' as described in [https://en~~.~~wikipedia~~.~~org~~/~~wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean~~ ]. ~~This is done for all reflections n in a resolution shell.~~

~~===''' <math>\sigma^2_{\epsilon} </math>''' -weighted===~~

~~to be edited~~

@@ Line 1: / Line 1: @@
-Δcc12 is a quantity, that detects datasets/frames, that are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], Δcc12 is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values, but is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method CC12 is calculated for all datasets/frames, which will be called CC12_overall (?) and CC12 is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC12_i. The difference of the two quantities is Δcc12.
+ΔCC<sub>1/2</sub> is a quantity that detects datasets/frames which are non-isomorphous. As described in [https://scripts.iucr.org/cgi-bin/paper?zw5005 Assmann and Diederichs (2016)], ΔCC<sub>1/2</sub> is calculated with the σ-τ method. This method is a way to calculate the Pearson correlation coefficient for the special case of two sets of values (intensities) that randomly deviate from their true values. The σ-τ method is not influenced by a random number sequence as shown in [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Karplus and Diederichs (2012)]. For the σ-τ method CC<sub>1/2</sub> is calculated for all datasets/frames, which will be called CC<sub>1/2_overall</sub> and CC<sub>1/2</sub> is calculated for all datasets/frames except for one dataset i, which is omitted from calculations and denoted as CC<sub>1/2_i</sub>. The difference of the two quantities is ΔCC<sub>1/2</sub>.
-: <math>\Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2 i} </math>
-If Δcc12 is > 0 -CC12overall is bigger than CC12i- that means if omitting dataset i from calculations, a lower CC12 results, which is why we want to keep it. Thus it is improving the whole merged dataset. If Δcc12 is < 0, -CC12overall is smaller than CC12i- that means that by omitting dataset i from calculations a higher CC12 results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset. CC12 is calculated by:
+: <math>\Delta CC_{1/2}= CC_{1/2 overall}-CC_{1/2\_i} </math>
-: <math>CC_{1/2}=\frac{\sigma^2_{\tau}}{\sigma^2_{\tau}+\sigma^2_{\epsilon}} =\frac{\sigma^2_{y}- \frac{1}{2}\sigma^2_{\epsilon}}{\sigma^2_{y}+ \frac{1}{2}\sigma^2_{\epsilon}} </math>
+If ΔCC<sub>1/2</sub> is > 0 (CC<sub>1/2_overall</sub> is bigger than CC<sub>1/2_i</sub>) it means that by omitting dataset i from calculations a lower CC<sub>1/2</sub> results. As we want to maximize CC<sub>1/2</sub> the dataset is kept for calculations, it is improving the whole merged dataset. If Δ CC<sub>1/2</sub> is < 0 (CC<sub>1/2_overall</sub> is smaller than CC<sub>1/2_i</sub>) it means that by omitting dataset i from calculations a higher CC<sub>1/2</sub> results, which is why we want to exclude it from calculations, because it is impairing the whole merged dataset.
-This requires calculation of <math>\sigma^2_{y} </math>, the variance of the average intensities across the unique reflections of a resolution shell, and <math>\sigma^2_{\epsilon} </math>, the average of all sample variances of the mean across all unique reflections of a resolution shell.
-== Implementation ==
+== Applications ==
-===''' <math>\sigma^2_{y} </math>'''===
+The ΔCC<sub>1/2</sub>  method is applicable for single frames, SSX data and SFX data. The program [[XDSCC12]] calculates ΔCC<sub>1/2</sub> for the isomorphous and anomalous signal for XDS_ASCII.HKL and XSCALE.HKL files. Exact description of calculation and implementation are found at [[CC1/2]].
-The unbiased sample variance from all averaged intensities of all unique reflections is calculated by:
-<math>\sigma^2_{y} = \frac{1}{n-1} \cdot \left ( \sum^n_{i} x^2_i - \frac{\left ( \sum^n_{i}x_{i} \right )^2}{ n} \right ) </math>
-With <math>x_{i} </math> , average intensity of all observations from all frames/crystals of one unique reflection i. This is done for all reflections n in a resolution shell.
-----
-===''' <math>\sigma^2_{\epsilon} </math>''' - unweighted===
-The average of all sample variances of the mean across all unique reflections of a resolution shell is obtained by calculating the sample variance of the mean for every unique reflection i by:
-<math>\sigma^2_{\epsilon i} =  \frac{1}{n-1} \cdot \left ( \sum^n_{j} x^2_{j} - \frac{\left ( \sum^n_{j}x_{j} \right )^2}{ n} \right )     \backslash \frac{n}{2} </math>
-With <math>x_{j} </math> , a single observation j of all observations n of one reflection i. <math>\sigma^2_{\epsilon i} </math> is then divided by the factor  <math>\frac{n}{2} </math>, because the variance of the sample mean (the merged observations) is the quantity of interest. As we are considering CC12, the variance <math>\sigma^2_{\epsilon i} </math> is divided by <math>\frac{n}{2} </math> and not by '''n''' as described in [https://en.wikipedia.org/wiki/Sample_mean_and_covariance#Variance_of_the_sample_mean ].  This is done for all reflections n in a resolution shell.
-===''' <math>\sigma^2_{\epsilon} </math>''' -weighted===
-to be edited