XSCALE ISOCLUSTER: Difference between revisions

No edit summary
Line 21: Line 21:
For dataset analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in an arrangement of N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large (see below). n=1 would be suitable if the datasets only differ in their random error.  One more dimension is required for each additional systematic property which may vary between the datasets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of the a axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the datasets may differ (like significant variations in a, b and c axes). In cases where datasets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, and several values need to be tried, and the results inspected.
For dataset analysis, the program uses the method of [https://dx.doi.org/10.1107/S1399004713025431 Brehm and Diederichs (2014) ''Acta Cryst'' '''D70''', 101-109] ([https://kops.uni-konstanz.de/bitstream/handle/123456789/26319/Brehm_263191.pdf?sequence=2&isAllowed=y PDF]) whose theoretical background is in [https://doi.org/10.1107/S2059798317000699 Diederichs (2017) ''Acta Cryst'' '''D73''', 286-293] (open access). This results in an arrangement of N datasets represented by N vectors in a low-dimensional space. Typically, the dimension of that space may be chosen as n=2 to 4, but may be higher if N is large (see below). n=1 would be suitable if the datasets only differ in their random error.  One more dimension is required for each additional systematic property which may vary between the datasets, e.g. n=2 is suitable if they only differ in their indexing mode (which then only should have two alternatives!), or in some other systematic property, like the length of the a axis. Higher values of n (e.g. n=4) are appropriate if e.g. there are 4 indexing possibilities (which is the case in P3<sub>x</sub>), or more systematic ways in which the datasets may differ (like significant variations in a, b and c axes). In cases where datasets differ e.g. with respect to the composition or conformation of crystallized molecules, it is ''a priori'' unknown which value of n should be chosen, and several values need to be tried, and the results inspected.


The program writes files called XSCALE.1.INP with lines required for scaling the datasets of cluster 1, and similarly XSCALE.2.INP for cluster 2, and so on. Typically, one may want to create directories 1 2 ..., and then establish symlinks (called XSCALE.INP) in these to the XSCALE.#.INP files. This enables separate scaling of each cluster.   
The program writes files called XSCALE.1.INP with lines required for scaling the datasets of cluster 1, and similarly XSCALE.2.INP for cluster 2, and so on. Typically, one may want to create directories cluster1 cluster2 ..., and then establish symlinks (called XSCALE.INP) in these to the XSCALE.#.INP files. This enables separate scaling of each cluster.   
   
   
Each file XSCALE.x.INP enumerates the contributing INPUT_FILEs in the order of increasing angular distance. Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.x.INP,  versus that in the original XSCALE.INP (which was used to obtain XSCALE_FILE). Then, ''dist'' refers to arccosine of the angle (e.g. a value of 1.57 would mean 90 degrees) to the center of the cluster (the lower the better/closer), ''strength'' refers to vector length which is inversely proportional to the random noise in a data set, and ''cluster'', if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster.  
== Output ==
 
The console output gives informational and error messages. Each file XSCALE.x.INP enumerates the contributing INPUT_FILEs in the order of increasing angular distance. Example:
<pre>
UNIT_CELL_CONSTANTS=  91.490  91.490  68.790  90.000  90.000  120.000
SPACE_GROUP_NUMBER= 145
OUTPUT_FILE=XSCALE.1.HKL
FRIEDEL'S_LAW=FALSE
SAVE_CORRECTION_IMAGES=FALSE
WFAC1=1
INPUT_FILE=../x4/XDS_ASCII.HKL
!new, old ISET=      1      3 strength,dist,cluster=    0.855    0.035      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x3/XDS_ASCII.HKL
!new, old ISET=      2      2 strength,dist,cluster=    0.861    0.045      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x9/XDS_ASCII.HKL
!new, old ISET=      3      7 strength,dist,cluster=    0.852    0.112      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x7/XDS_ASCII.HKL
!new, old ISET=      4      6 strength,dist,cluster=    0.902    0.155      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x1/XDS_ASCII.HKL
!new, old ISET=      5      1 strength,dist,cluster=    0.749    0.173      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x5/XDS_ASCII.HKL
!new, old ISET=      6      4 strength,dist,cluster=    0.678    0.223      1
!INCLUDE_RESOLUTION_RANGE=00 00
INPUT_FILE=../x6//XDS_ASCII.HKL
!new, old ISET=      7      5 strength,dist,cluster=    0.788    0.406      1
!INCLUDE_RESOLUTION_RANGE=00 00
</pre>
 
Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.x.INP,  ''versus'' that in the original XSCALE.INP (which produced XSCALE_FILE). Then, ''dist'' refers to arccosine of the angle (e.g. a value of 1.57 would mean 90 degrees) to the center of the cluster (the lower the better/closer), ''strength'' refers to vector length which is inversely proportional to the random noise in a data set, and ''cluster'', if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster.


== Notes ==
== Notes ==
2,684

edits