Xds nonisomorphism: Difference between revisions

no edit summary
(expand)
No edit summary
Line 1: Line 1:
[ftp://turn5.biologie.uni-konstanz.de/pub/linux_bin/xds_nonisomorphism xds_nonisomorphism][ftp://turn5.biologie.uni-konstanz.de/pub/sources/xds_nonisomorphism.f90] is a program that analyzes data sets stored in unmerged reflection files (typically called XDS_ASCII.HKL) as written by [[XDS]]. It implements the method of [https://doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014)] and theory of [https://doi.org/10.1107/S2059798317000699 Diederichs (2017)]. Its purpose is the identification of non-isomorphous (i.e. dissimilar or less well related) data sets among other, more similar data sets. As a consequence of running xds_nonisomorphism, one may choose to only merge the most isomorphous (similar) data sets, and to discard the non-isomorphous ones - or to analyze these separately.  
[ftp://turn5.biologie.uni-konstanz.de/pub/linux_bin/xds_nonisomorphism xds_nonisomorphism][ftp://turn5.biologie.uni-konstanz.de/pub/sources/xds_nonisomorphism.f90] is a program that analyzes data sets stored in unmerged reflection files (typically called XDS_ASCII.HKL) as written by [[XDS]]. It implements equation 2 of the theory of [https://doi.org/10.1107/S2059798317000699 Diederichs (2017)]. Its purpose is the identification of non-isomorphous (i.e. dissimilar or less well related) data sets among other, more similar data sets. As a consequence of running xds_nonisomorphism, the user may choose to only merge the most isomorphous (similar) data sets, and to discard the non-isomorphous ones - or to analyze these separately. That choice is not done automatically by the program; rather it is assumed that the user will choose the isomorphous data sets based on the program output, and scale these e.g. with [[XSCALE]].


It should be noted that the result of the analyis does not depend on the amount of random error, which means it does not depend on the strengths of data sets - it works just as well for weakly or strongly exposed crystals.  
It should be noted that the result of the analyis does not depend on the amount of random error, which means it does not depend on the strengths of data sets - it works just as well for weakly or strongly exposed crystals, and for tiny or big ones.  


xds_nonisomorphism prints a short help text if the -h option is used.
xds_nonisomorphism prints a short help text if the -h option is used.
Line 12: Line 12:
== Calculation ==
== Calculation ==


In particular, for each pair it determines  
In particular, for each pair of data sets it determines  
* the CC* values (Karplus & Diederichs (2012). Science 336, 1030–1033) from the [[CC1/2]] of the data sets (using the σ-τ method of Assmann ''et al.'', J. Appl. Cryst. (2016). 49, 1021–1028) in columns 3 and 4 of the output, and  
* the CC* values (Karplus & Diederichs (2012). Science 336, 1030–1033) from the [[CC1/2]] of the data sets (using the σ-τ method of Assmann ''et al.'', J. Appl. Cryst. (2016). 49, 1021–1028) in columns 3 and 4 of the output, and  
* the pairwise (Pearson's) correlation coefficients (column 5).  
* the pairwise (Pearson's) correlation coefficients (column 5).  
Line 20: Line 20:
== Analysis and interpretation ==
== Analysis and interpretation ==


Angles (calculated as the inverse cosine of the ratio) are expressed in degrees. Less than 10° may be considered good isomorphism, 90° means highly non-isomorphous  (i.e. completely unrelated) datasets. However, as seen in actual tables, the numerical value (and the interpretation of the magnitude of an angle) depends on the resolution. But there is another interpretation of the ratio (column 6) - not as cos(phi) but as a correlation itself. To make sense of this interpretation, the program uses a formula (McCoy et al. (2017) PNAS 114, 3637-3641 equation 1) that relates coordinate difference to correlation (column 8 of output). This coordinate RMSD value should be independent of resolution. If it is (which is sometimes seen in pairwise comparisons of data sets) then this is an indication that some other systematic difference, that cannot be interpreted as coordinate difference, exists between data sets. Candidates are many kinds of sources of systematic error, e.g. errors in data processing, twinning, overloads, vibrations ...  
Angles (calculated as the inverse cosine of the ratio) are expressed in degrees. Less than 10° may be considered good isomorphism, 90° means highly non-isomorphous  (i.e. completely unrelated) datasets. However, as seen in actual tables, the numerical value (and the interpretation of the magnitude of an angle) depends on the resolution. But there is another interpretation of the ratio (column 6) - not as cos(phi) but as a factor. To make sense of this interpretation, the program uses a formula (McCoy et al. (2017) PNAS 114, 3637-3641 equation 1) that relates coordinate difference (column 8 of output) to the factor. This coordinate RMSD value should be independent of resolution. If it is ''not'' (which is sometimes seen in pairwise comparisons of data sets) then this is an indication that some other systematic difference, that cannot be interpreted as coordinate difference, exists between data sets. Candidates are many kinds of sources of systematic error, e.g. errors in data processing, twinning, overloads, vibrations ...  


After the analysis, the program produces a 3D representation of the arrangement of data sets such that their distances in 3D try to reproduce the angles (please note that this representation is completely different from that of [[xscale_isocluster]]!).
After the analysis, the program produces a 3D representation of the arrangement of data sets such that their distances in 3D try to reproduce the angles (which are averaged across resolution bins). Please note that this representation is completely different from that of [[xscale_isocluster]]!  




2,684

edits