XSCALE ISOCLUSTER: Difference between revisions

(2 intermediate revisions by the same user not shown)

Line 33:

Furthermore, a file iso.pdb is produced that may be loaded into coot. Then use Show/Cell and Symmetry/Show unit cell (to see the origin, which coot marks with "0"), and visualize the relations between data sets. Systematic differences are related to the angle (with the tip of the angle at the origin) between the vectors that represent the data sets; ideally, in the case of isomorphous data sets all vectors point into the same direction. Random differences are related to the lengths of the vectors (starting at the origin; short vectors correspond to weak/noisy data sets). With the -i option, individual iso.x.pdb files can be written for each cluster. For an example, see [[SSX]].

=== Resolving indexing ambiguity ===

A useful set of options for resolving an indexing ambiguity is shown in the following example:

xscale_isocluster -i -dim 2 -clu 2 -dmin 20 -dmax 2.5 XSCALE.HKL

Line 48:

Line 49:

The console output gives informational and error messages. Each file XSCALE.x.INP enumerates the contributing INPUT_FILEs in the order of increasing angular distance. Example:

<pre>

UNIT_CELL_CONSTANTS= 91.~~490~~ ~~91.490 68~~.~~790~~ 90.000 90.000 120.000

UNIT_CELL_CONSTANTS= 88.740 88.740 104.930 90.000 90.000 120.000

SPACE_GROUP_NUMBER= ~~145~~

SPACE_GROUP_NUMBER= 152

OUTPUT_FILE=XSCALE.1.HKL

~~FRIEDEL'S_LAW=FALSE~~

SAVE_CORRECTION_IMAGES=FALSE

WFAC1=1

PRINT_CORRELATIONS=FALSE

INPUT_FILE=../x4/XDS_ASCII.HKL

WFAC1=1.25 ! XDS/XSCALE defaults are 1.0/1.5

!new, old ISET= 1 ~~3 strength~~,~~dist~~,cluster= 0.~~855~~ 0.~~035~~ 1

INPUT_FILE=../xds_ss091d3chip/1501_1506/XDS_ASCII.HKL

!new, old ISET= 1 134 length=CC*,angle,cluster= 0.120 0.4 1

!INCLUDE_RESOLUTION_RANGE=00 00

INPUT_FILE=../x3/XDS_ASCII.HKL

WEIGHT= 1.000

!new, old ISET= 2 ~~2 strength~~,~~dist~~,cluster= 0.~~861~~ 0.~~045~~ 1

INPUT_FILE=../xds_ss091c10chip/2281_2286/XDS_ASCII.HKL

!new, old ISET= 2 96 length=CC*,angle,cluster= 0.922 1.9 1

!INCLUDE_RESOLUTION_RANGE=00 00

INPUT_FILE=../x9/XDS_ASCII.HKL

WEIGHT= 1.001

!new, old ISET= 3 ~~7 strength~~,~~dist~~,cluster= 0.~~852~~ 0.~~112~~ 1

INPUT_FILE=../xds_ss091b11chip/751_756/XDS_ASCII.HKL

!new, old ISET= 3 46 length=CC*,angle,cluster= 0.556 2.1 1

!INCLUDE_RESOLUTION_RANGE=00 00

~~INPUT_FILE=../x7/XDS_ASCII.HKL~~

WEIGHT= 1.001

~~!new, old ISET= 4 6 strength,dist,cluster~~= 0.~~902 0.155 1~~

INPUT_FILE=../xds_ss091a11chip/121_126/XDS_ASCII.HKL

~~!INCLUDE_RESOLUTION_RANGE=00 00~~

!new, old ISET= 4 14 length=CC*,angle,cluster= 0.602 22.8 1

INPUT_FILE=../x1/~~XDS_ASCII.HKL~~

~~!new, old ISET= 5 1 strength,dist,cluster= 0.749 0.173 1~~

~~!INCLUDE_RESOLUTION_RANGE=00 00~~

~~INPUT_FILE=../x5~~/XDS_ASCII.HKL

!new, old ISET= 6 4 ~~strength,dist,cluster= 0.678~~ ~~0.223 1~~

~~!INCLUDE_RESOLUTION_RANGE~~=~~00 00~~

~~INPUT_FILE=../x6/XDS_ASCII.HKL~~

~~!new, old ISET= 7 5 strength~~,~~dist~~,cluster= 0.~~788 0~~.~~406~~ 1

!INCLUDE_RESOLUTION_RANGE=00 00

...

</pre>

Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.#.INP, ''versus'' that in the original XSCALE.INP (which produced XSCALE_FILE). Then, ''~~dist~~'' refers to ~~arccosine of~~ the angle (~~e.g. a value of 1.57 would mean 90~~ degrees) to the center of the cluster (the lower the better/closer), ~~''strength'' refers to vector length which is inversely proportional~~ to ~~the random noise in a data set, and~~ ''cluster'', if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster. Furthermore, resolution ranges may be specified, possibly based on the output of [[XDSCC12]].

Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.#.INP, ''versus'' that in the original XSCALE.INP (which produced XSCALE_FILE). Then, ''length=CC*,angle,cluster'' refers to vector length which is inversely proportional to the random noise in a data set, to the angle (in degrees) to the center of the cluster (the lower the better/closer), and to ''cluster'', which if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster. Furthermore, resolution ranges may be specified, possibly based on the output of [[XDSCC12]].

== Notes ==

Line 83:

Line 79:

* The clustering of data sets in a low-dimensional space uses the method of Rodriguez and Laio (2014) ''Science'' '''344''', 1492-1496. The clustering result should be checked by the user; one should not rely on this to give sensible results! The main criterion for a cluster should be that all data sets in it are in the same or similar direction, when seen from the origin ("0" in coot) - the length of each vector is not important since it is ''not'' related to the amount of non-isomorphism, but to the strength of the data set.

* The eigenvalues are printed out by the program, and can be used to deduce the proper value of the required dimension n. To make use of this, one should run with a high value of dim (e.g. 5), and inspect the list of eigenvalues with the goal of finding a significant drop in magnitude (e.g. a factor of 3 drop between the second and third eigenvalue would point to the third eigenvector being of low importance).

* Example: [[Scale many datasets]].

* A different but related program is [[xds_nonisomorphism]].

@@ Line 33: / Line 33: @@
 Furthermore, a file iso.pdb is produced that may be loaded into coot. Then use Show/Cell and Symmetry/Show unit cell (to see the origin, which coot marks with "0"), and visualize the relations between data sets. Systematic differences are related to the angle (with the tip of the angle at the origin) between the vectors that represent the data sets; ideally, in the case of isomorphous data sets all vectors point into the same direction. Random differences are related to the lengths of the vectors (starting at the origin; short vectors correspond to weak/noisy data sets). With the -i option, individual iso.x.pdb files can be written for each cluster. For an example, see [[SSX]].
+=== Resolving indexing ambiguity ===
 A useful set of options for resolving an indexing ambiguity is shown in the following example:
   xscale_isocluster -i -dim 2 -clu 2 -dmin 20 -dmax 2.5 XSCALE.HKL
@@ Line 48: / Line 49: @@
 The console output gives informational and error messages. Each file XSCALE.x.INP enumerates the contributing INPUT_FILEs in the order of increasing angular distance. Example:
 <pre>
-UNIT_CELL_CONSTANTS=  91.490  91.490   68.790   90.000   90.000  120.000
+UNIT_CELL_CONSTANTS=   88.740   88.740  104.930   90.000   90.000  120.000
-SPACE_GROUP_NUMBER= 145
+SPACE_GROUP_NUMBER= 152
 OUTPUT_FILE=XSCALE.1.HKL
-FRIEDEL'S_LAW=FALSE
 SAVE_CORRECTION_IMAGES=FALSE
-WFAC1=1
+PRINT_CORRELATIONS=FALSE
-INPUT_FILE=../x4/XDS_ASCII.HKL
+WFAC1=1.25 ! XDS/XSCALE defaults are 1.0/1.5
-!new, old ISET=      1      3 strength,dist,cluster=     0.855     0.035      1
+INPUT_FILE=../xds_ss091d3chip/1501_1506/XDS_ASCII.HKL
+!new, old ISET=      1    134 length=CC*,angle,cluster=     0.120     0.4      1
 !INCLUDE_RESOLUTION_RANGE=00 00
-INPUT_FILE=../x3/XDS_ASCII.HKL
+WEIGHT=     1.000
-!new, old ISET=      2      2 strength,dist,cluster=     0.861     0.045      1
+INPUT_FILE=../xds_ss091c10chip/2281_2286/XDS_ASCII.HKL
+!new, old ISET=      2     96 length=CC*,angle,cluster=     0.922     1.9      1
 !INCLUDE_RESOLUTION_RANGE=00 00
-INPUT_FILE=../x9/XDS_ASCII.HKL
+WEIGHT=     1.001
-!new, old ISET=      3      7 strength,dist,cluster=     0.852     0.112      1
+INPUT_FILE=../xds_ss091b11chip/751_756/XDS_ASCII.HKL
+!new, old ISET=      3     46 length=CC*,angle,cluster=     0.556     2.1      1
 !INCLUDE_RESOLUTION_RANGE=00 00
-INPUT_FILE=../x7/XDS_ASCII.HKL
+WEIGHT=     1.001
-!new, old ISET=      4      6 strength,dist,cluster=     0.902     0.155      1
+INPUT_FILE=../xds_ss091a11chip/121_126/XDS_ASCII.HKL
-!INCLUDE_RESOLUTION_RANGE=00 00
+!new, old ISET=      4     14 length=CC*,angle,cluster=     0.602    22.8      1
-INPUT_FILE=../x1/XDS_ASCII.HKL
-!new, old ISET=      5      1 strength,dist,cluster=     0.749     0.173      1
-!INCLUDE_RESOLUTION_RANGE=00 00
-INPUT_FILE=../x5/XDS_ASCII.HKL
-!new, old ISET=      6      4 strength,dist,cluster=     0.678     0.223      1
-!INCLUDE_RESOLUTION_RANGE=00 00
-INPUT_FILE=../x6/XDS_ASCII.HKL
-!new, old ISET=      7      5 strength,dist,cluster=     0.788     0.406      1
 !INCLUDE_RESOLUTION_RANGE=00 00
+...
 </pre>
-Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.#.INP,  ''versus'' that in the original XSCALE.INP (which produced XSCALE_FILE). Then, ''dist'' refers to arccosine of the angle (e.g. a value of 1.57 would mean 90 degrees) to the center of the cluster (the lower the better/closer), ''strength'' refers to vector length which is inversely proportional to the random noise in a data set, and ''cluster'', if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster. Furthermore, resolution ranges may be specified, possibly based on the output of [[XDSCC12]].
+Each INPUT_FILE line is followed by a comment line. In this, the first two numbers (''new'' and ''old'') refer to the numbering of datasets in the resulting XSCALE.#.INP,  ''versus'' that in the original XSCALE.INP (which produced XSCALE_FILE). Then, ''length=CC*,angle,cluster'' refers to vector length which is inversely proportional to the random noise in a data set, to the angle (in degrees) to the center of the cluster (the lower the better/closer), and to ''cluster'', which if negative, identifies a dataset that is outside the core of the cluster. To select good datasets and reject bad ones, the user may comment out INPUT_FILE lines which refer to datasets that are far away in angle or outside the core of the cluster. Furthermore, resolution ranges may be specified, possibly based on the output of [[XDSCC12]].
 == Notes ==
@@ Line 83: / Line 79: @@
 * The clustering of data sets in a low-dimensional space uses the method of Rodriguez and Laio (2014) ''Science'' '''344''', 1492-1496. The clustering result should be checked by the user; one should not rely on this to give sensible results! The main criterion for a cluster should be that all data sets in it are in the same or similar direction, when seen from the origin ("0" in coot) - the length of each vector is not important since it is ''not'' related to the amount of non-isomorphism, but to the strength of the data set.
 * The eigenvalues are printed out by the program, and can be used to deduce the proper value of the required dimension n. To make use of this, one should run with a high value of dim (e.g. 5), and inspect the list of eigenvalues with the goal of finding a significant drop in magnitude (e.g. a factor of 3 drop between the second and third eigenvalue would point to the third eigenvector being of low importance).
+* Example: [[Scale many datasets]].
 * A different but related program is [[xds_nonisomorphism]].