XSCALE ISOCLUSTER: Difference between revisions

Jump to navigation Jump to search
m
no edit summary
(update help text)
mNo edit summary
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
xscale_isocluster [ftp://turn5.biologie.uni-konstanz.de/pub/linux_bin/xscale_isocluster (Linux binary)][ftp://turn5.biologie.uni-konstanz.de/pub/mac_bin/xscale_isocluster (Mac binary)] is a program that clusters datasets stored in a single unmerged reflection file as written by [[XSCALE]]. It implements the method of [https://doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014)] and theory of [https://doi.org/10.1107/S2059798317000699 Diederichs (2017)].
xscale_isocluster [https://{{SERVERNAME}}/pub/linux_bin/xscale_isocluster (Linux binary)][https://{{SERVERNAME}}/pub/mac_bin/xscale_isocluster (Mac binary)] is a program that clusters datasets stored in a single unmerged reflection file as written by [[XSCALE]]. It implements the method of [https://doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014)] and theory of [https://doi.org/10.1107/S2059798317000699 Diederichs (2017)].


The help output (obtained by using the <code>-h</code> option) is
The help output (obtained by using the <code>-h</code> option) is
Line 23: Line 23:
== Usage ==
== Usage ==


For dataset analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in segmentation, i.e. an arrangement of the N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large.  
For data set analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in segmentation, i.e. an arrangement of the N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large.  


n=1 would be suitable if the datasets only differ in their random error (i.e. they are highly isomorphous).  One more dimension is required for each additional systematic property which may vary between the datasets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of a cell axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the datasets may differ (like significant variations in the a, b and c axes), or conformational or compositional differences. In cases where datasets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, so several values need to be tried, and the results inspected (see [[Xscale_isocluster#Notes]]).
n=1 would only be suitable if the data sets only differ in their random error (i.e. they are highly isomorphous).  One more dimension is required for each additional systematic property which may vary between the data sets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of a cell axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the data sets may differ (like significant variations in the a, b and c axes), or conformational or compositional differences. In cases where data sets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, so several values need to be tried, and the results inspected (see [[Xscale_isocluster#Notes]]).


After segmentation of data sets in n-dimensional space, the program may be used (by specifying the -clu <m> option; default m=1) to try and identify <m> clusters of datasets. The program writes files called XSCALE.1.INP with lines required for scaling the datasets of cluster 1, and similarly XSCALE.2.INP for cluster 2, and so on. Typically, one may want to create directories cluster1 cluster2 ..., and then establish symlinks (called XSCALE.INP) in these to the XSCALE.#.INP files. This enables separate scaling of each cluster.
After segmentation of data sets in n-dimensional space, the program may be used (by specifying the -clu <m> option; default m=1) to try and identify <m> clusters of data sets. The program writes files called XSCALE.1.INP with lines required for scaling the datasets of cluster 1, and similarly XSCALE.2.INP for cluster 2, and so on. Typically, one may want to create directories cluster1 cluster2 ..., and then establish symlinks (called XSCALE.INP) in these to the XSCALE.#.INP files. This enables separate scaling of each cluster.


Furthermore, a file iso.pdb is produced that may be loaded into coot. Then use Show/Cell and Symmetry/Show unit cell, and visualize the relations between datasets. With the -I option, individual iso.x.pdb files can be written for each cluster. For an example, see [[SSX]].
Furthermore, a file iso.pdb is produced that may be loaded into coot. Then use Show/Cell and Symmetry/Show unit cell (to see the origin, which coot marks with "0"), and visualize the relations between data sets. Systematic differences are related to the angle (with the tip of the angle at the origin) between the vectors that represent the data sets; ideally, in the case of isomorphous data sets all vectors point into the same direction. Random differences are related to the lengths of the vectors (starting at the origin; short vectors correspond to weak/noisy data sets). With the -i option, individual iso.x.pdb files can be written for each cluster. For an example, see [[SSX]].


== Output ==
== Output ==
Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu