2,684
edits
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
[ftp://turn5.biologie.uni-konstanz.de/pub/xscale_isocluster_linux.bz2 xscale_isocluster] is a program that clusters datasets stored in a single unmerged reflection file as written by [[XSCALE]]. | [ftp://turn5.biologie.uni-konstanz.de/pub/xscale_isocluster_linux.bz2 xscale_isocluster] is a program that clusters datasets stored in a single unmerged reflection file as written by [[XSCALE]]. | ||
The help output is | The help output (obtained by using the <code>-h</code> option) is | ||
<pre> | <pre> | ||
xscale_isocluster KD 2016-12-20. -h option shows options | xscale_isocluster KD 2016-12-20. -h option shows options | ||
Line 19: | Line 19: | ||
For dataset analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in an arrangement of N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large (see below). n=1 would be suitable if the datasets only differ in their random error. One more dimension is required for each additional systematic property which may vary between the datasets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of the a axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the datasets may differ (like significant variations in a, b and c axes). In cases where datasets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, and several values need to be tried, and the results inspected. | For dataset analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in an arrangement of N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large (see below). n=1 would be suitable if the datasets only differ in their random error. One more dimension is required for each additional systematic property which may vary between the datasets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of the a axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the datasets may differ (like significant variations in a, b and c axes). In cases where datasets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, and several values need to be tried, and the results inspected. | ||
For meaningful results, the number of known values (N*(N-1)/2 is the number of pairwise correlation coefficients) should be (preferrably much) higher than the number of unknowns (1+n*(N-1)). | The program writes files called XSCALE.1.INP with lines required for scaling the datasets of cluster 1, and similarly XSCALE.2.INP for cluster 2, and so on. Typically, one should create subdirectories 1 2 ..., and then create symlinks in these called XSCALE.INP to the XSCALE.#.INP files. This enables separate scaling of each cluster. | ||
For meaningful results, the number of known values (N*(N-1)/2 is the number of pairwise correlation coefficients) should be (preferrably much) higher than the number of unknowns (1+n*(N-1)). This means that one needs at least 5 datasets if dim=2, and at least 7 if dim=3. | |||
The clustering of datasets in the low-dimensional space uses the method of Rodriguez and Laio (2014) ''Science'' '''344''', 1492-1496. | The clustering of datasets in the low-dimensional space uses the method of Rodriguez and Laio (2014) ''Science'' '''344''', 1492-1496. |