Simulated-1g1c: Difference between revisions
No edit summary |
No edit summary |
||
Line 14: | Line 14: | ||
We have to get some idea about possible spacegroups first. This means processing some of the datasets. Let's choose "xtal100", the last one. | We have to get some idea about possible spacegroups first. This means processing some of the datasets. Let's choose "xtal100", the last one. | ||
generate_XDS.INP "../../Illuin/microfocus/ | generate_XDS.INP "../../Illuin/microfocus/xtal100_1_0??.img" | ||
To maximize the number of reflections that should be used for spacegroup determination, the only changes to XDS.INP are: | To maximize the number of reflections that should be used for spacegroup determination, the only changes to XDS.INP are: | ||
TEST_RESOLUTION_RANGE= 50 0 ! default is 10 4 | TEST_RESOLUTION_RANGE= 50 0 ! default is 10 4 ; we want all reflections instead | ||
DATA_RANGE= 1 1 ! R-factors involving more than 1 frame are meaningless | DATA_RANGE= 1 1 ! R-factors involving more than 1 frame are meaningless | ||
! with such strong radiation damage | ! with such strong radiation damage | ||
We run "xds" and, after a few seconds, can inspect IDXREF.LP and CORRECT.LP. It turns out the primitve cell is 38.3, 79.2, 79.2, 90, 90, 90 which is compatible with tetragonal spacegroups, or those with lower symmetry: | We run "xds" and, after a few seconds, can inspect IDXREF.LP and CORRECT.LP. It turns out that the primitve cell is 38.3, 79.2, 79.2, 90, 90, 90 which is compatible with tetragonal spacegroups, or those with lower symmetry: | ||
LATTICE- BRAVAIS- QUALITY UNIT CELL CONSTANTS (ANGSTROEM & DEGREES) REINDEXING TRANSFORMATION | LATTICE- BRAVAIS- QUALITY UNIT CELL CONSTANTS (ANGSTROEM & DEGREES) REINDEXING TRANSFORMATION | ||
Line 74: | Line 74: | ||
== devising a bootstrap procedure == | == devising a bootstrap procedure == | ||
We have to realize that, since the b and c axes are equal, we can index each dataset in two non-equivalent ways. This is the same situation as | We have to realize that, since the b and c axes are equal, we can index each dataset in two non-equivalent ways. This is the same situation as occurs e.g. for spacegroups P3(x) and P4(x), and means that we'll have to use a REFERENCE_DATA_SET to get the right setting for each of the 100 datasets. | ||
However, we cannot expect that all of the datasets have enough reflections in common with a given dataset. Thus, we have to update and enlarge the REFERENCE_DATA_SET after the first round, using those datasets that have reflections in common with the old REFERENCE_DATA_SET. Then in a second round, we can hopefully identify the correct setting for all datasets. After that, we can scale everything together. | |||
== first round of bootstrap == | |||
We choose xtal100 as the first reference, and move its XDS_ASCII.HKL to bootstrap/reference.ahkl. A script that goes through all datasets, produces XDS.INP, and runs xds is the following (note that we only REFINE(IDXREF)= ORIENTATION BEAM , and the same for REFINE(INTEGRATE), since it may be useful to keep the b and c axis exactly the same): | |||
<pre> | |||
#!/bin/csh -f | |||
foreach f ( Illuin/microfocus/xtal*_1_001.img ) | |||
setenv x `echo $f | cut -c 19-25` | |||
echo processing $x | |||
rm -rf bootstrap/$x | |||
mkdir bootstrap/$x | |||
cd bootstrap/$x | |||
cat>XDS.INP<<EOF | |||
JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT | |||
ORGX= 1511.2 ORGY= 1553.1 ! ORGX=1507 ORGY=1570 if BEAM is not refined | |||
DETECTOR_DISTANCE= 250 | |||
OSCILLATION_RANGE= 1 | |||
X-RAY_WAVELENGTH= 0.979338 | |||
NAME_TEMPLATE_OF_DATA_FRAMES=../../Illuin/microfocus/${x}_1_0??.img | |||
DATA_RANGE=1 1 | |||
SPOT_RANGE=1 1 | |||
REFERENCE_DATA_SET=../reference.ahkl | |||
TEST_RESOLUTION_RANGE= 50.0 2.0 ! for correlating with reference | |||
SPACE_GROUP_NUMBER=16 ! 0 if unknown | |||
UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 ! mean of CORRECT outputs | |||
INCLUDE_RESOLUTION_RANGE=60 1.8 ! after CORRECT, insert high resol limit; re-run CORRECT | |||
TRUSTED_REGION=0.00 1. ! partially use corners of detectors; 1.41421=full use | |||
VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=7000. 30000. ! often 8000 is ok | |||
MINIMUM_ZETA=0.05 ! integrate close to the Lorentz zone; 0.15 is default | |||
STRONG_PIXEL=5 | |||
MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=3 ! default of 6 is sometimes too high | |||
REFINE(INTEGRATE)= ORIENTATION BEAM ! AXIS DISTANCE CELL | |||
REFINE(IDXREF)= ORIENTATION BEAM ! AXIS DISTANCE CELL | |||
! parameters specifically for this detector and beamline: | |||
DETECTOR= ADSC MINIMUM_VALID_PIXEL_VALUE= 1 OVERLOAD= 65000 | |||
NX= 3072 NY= 3072 QX= 0.102539 QY= 0.102539 ! to make CORRECT happy if frames are unavailable | |||
DIRECTION_OF_DETECTOR_X-AXIS=1 0 0 | |||
DIRECTION_OF_DETECTOR_Y-AXIS=0 1 0 | |||
INCIDENT_BEAM_DIRECTION=0 0 1 ! 0.00203 -0.0065 1.02107 ! mean of CORRECT outputs | |||
ROTATION_AXIS=1 0 0 ! at e.g. SERCAT ID-22 this needs to be -1 0 0 | |||
FRACTION_OF_POLARIZATION=0.98 ! better value is provided by beamline staff! | |||
POLARIZATION_PLANE_NORMAL=0 1 0 | |||
EOF | |||
xds >& xds.log & | |||
sleep 1 | |||
cd ../.. | |||
end | |||
</pre> | |||
Running this script takes 2 minutes. After this, it's a good idea to check whether the cell parameters are really what we assumed they are: | |||
grep UNIT_CELL_CO xtal0[01]*/XDS_ASCII.HKL | cut -c24- > CELLPARM.INP | |||
cellparm | |||
cat CELLPARM.LP | |||
and obtain: | |||
A B C ALPHA BETA GAMMA WEIGHT | |||
38.311 79.096 79.107 90.000 90.000 90.000 1.0 | |||
38.292 79.081 79.078 90.000 90.000 90.000 1.0 | |||
38.285 79.021 79.048 90.000 90.000 90.000 1.0 | |||
38.308 79.106 79.099 90.000 90.000 90.000 1.0 | |||
38.298 79.096 79.084 90.000 90.000 90.000 1.0 | |||
38.310 79.117 79.109 90.000 90.000 90.000 1.0 | |||
38.317 79.120 79.124 90.000 90.000 90.000 1.0 | |||
38.302 79.102 79.097 90.000 90.000 90.000 1.0 | |||
38.309 79.119 79.134 90.000 90.000 90.000 1.0 | |||
38.288 79.098 79.128 90.000 90.000 90.000 1.0 | |||
38.294 79.102 79.119 90.000 90.000 90.000 1.0 | |||
38.299 79.104 79.100 90.000 90.000 90.000 1.0 | |||
38.296 79.113 79.058 90.000 90.000 90.000 1.0 | |||
38.322 79.091 79.120 90.000 90.000 90.000 1.0 | |||
38.284 79.082 79.094 90.000 90.000 90.000 1.0 | |||
38.284 79.103 79.098 90.000 90.000 90.000 1.0 | |||
38.303 79.109 79.111 90.000 90.000 90.000 1.0 | |||
38.293 79.084 79.083 90.000 90.000 90.000 1.0 | |||
38.300 79.095 79.101 90.000 90.000 90.000 1.0 | |||
------------------------------------------------------------------------ | |||
38.300 79.097 79.100 90.000 90.000 90.000 19.0 | |||
Why not use all datasets? The reason is that cellparm has a limit of 20 datasets! | |||
Now we run xscale with the following XSCALE.INP : | |||
<pre> | |||
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90 | |||
SPACE_GROUP_NUMBER=19 | |||
OUTPUT_FILE=temp.ahkl | |||
INPUT_FILE=../xtal001/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal002/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal003/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal004/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal005/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal006/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal007/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal008/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal009/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal010/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal011/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal012/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal013/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal014/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal015/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal016/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal017/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal018/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal019/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal020/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal021/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal022/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal023/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal024/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal025/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal026/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal027/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal028/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal029/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal030/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal031/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal032/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal033/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal034/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal035/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal036/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal037/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal038/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal039/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal040/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal041/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal042/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal043/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal044/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal045/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal046/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal047/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal048/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal049/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal050/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal051/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal052/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal053/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal054/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal055/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal056/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal057/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal058/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal059/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal060/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal061/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal062/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal063/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal064/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal065/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal066/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal067/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal068/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal069/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal070/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal071/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal072/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal073/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal074/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal075/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal076/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal077/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal078/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal079/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal080/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal081/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal082/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal083/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal084/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal085/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal086/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal087/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal088/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal089/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal090/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal091/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal092/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal093/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal094/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal095/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal096/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal097/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal098/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal099/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal100/XDS_ASCII.HKL | |||
</pre> | |||
xscale writes XSCALE.LP which has the 5050 correlation coefficients of every dataset with every other dataset! The order of listing of the correlation coefficients is such that it turns out that is was a good choice to have xtal100 as the REFERENCE_DATA_SET, because we find this list: | |||
CORRELATIONS BETWEEN INPUT DATA SETS AFTER CORRECTIONS | |||
DATA SETS NUMBER OF COMMON CORRELATION RATIO OF COMMON B-FACTOR | |||
#i #j REFLECTIONS BETWEEN i,j INTENSITIES (i/j) BETWEEN i,j | |||
with these 99 lines: | |||
1 100 12 0.601 0.8200 0.0085 | |||
2 100 24 0.998 0.9001 0.5637 | |||
3 100 16 0.990 0.9216 -0.2983 | |||
4 100 16 0.239 1.9141 -0.2253 | |||
5 100 31 0.996 0.9231 0.3755 | |||
6 100 22 0.997 0.9412 0.2726 | |||
7 100 11 0.976 0.8848 -0.1225 | |||
8 100 5 0.967 0.9166 0.0435 | |||
9 100 34 0.160 1.2885 0.0774 | |||
10 100 11 0.860 2.9740 -0.2614 | |||
11 100 8 0.997 0.8732 0.6032 | |||
12 100 8 0.998 1.0145 -0.4169 | |||
13 100 22 1.000 0.9313 0.1664 | |||
14 100 8 0.900 0.8040 0.2744 | |||
15 100 10 0.986 0.9510 0.1738 | |||
16 100 1 0.000 0.9685 0.0000 | |||
17 100 14 0.991 0.8700 0.3395 | |||
18 100 7 0.997 1.0546 -0.2113 | |||
19 100 23 1.000 1.0451 -0.0246 | |||
20 100 24 0.266 0.6392 0.1091 | |||
21 100 20 0.995 0.8529 0.6281 | |||
22 100 12 0.072 0.9376 -0.0406 | |||
23 100 19 0.999 0.9366 0.0670 | |||
24 100 14 0.998 1.0986 -0.7853 | |||
25 100 4 0.939 1.0483 -0.0886 | |||
26 100 26 0.993 0.9633 0.0813 | |||
27 100 30 0.990 0.9782 -0.0191 | |||
28 100 30 0.995 0.9124 -0.0781 | |||
29 100 13 0.488 2.1279 -0.2548 | |||
30 100 18 0.283 1.2442 0.0585 | |||
31 100 23 0.995 0.9249 0.4751 | |||
32 100 22 0.293 2.7799 -0.1715 | |||
33 100 7 1.000 1.0706 -0.2011 | |||
34 100 6 0.987 0.9888 -0.0007 | |||
35 100 8 0.989 0.9895 -0.1751 | |||
36 100 23 0.985 0.8494 0.3038 | |||
37 100 8 0.966 0.7378 -0.0108 | |||
38 100 7 1.000 1.1335 -0.0927 | |||
39 100 11 0.982 0.9994 -0.5811 | |||
40 100 16 0.994 0.7549 0.8741 | |||
41 100 12 0.986 0.9478 -0.4168 | |||
42 100 11 0.994 0.8285 0.7668 | |||
43 100 9 0.997 0.9595 -0.2219 | |||
44 100 15 1.000 0.8666 0.2884 | |||
45 100 13 0.517 1.6433 0.0034 | |||
46 100 13 0.296 1.4431 -0.0938 | |||
47 100 18 0.857 0.9734 0.3337 | |||
48 100 13 0.999 0.9627 0.2611 | |||
49 100 22 0.991 0.8798 0.2976 | |||
50 100 14 0.999 1.1206 -1.0748 | |||
51 100 10 0.999 0.9296 0.5194 | |||
52 100 8 0.899 1.3901 0.0190 | |||
53 100 24 0.998 1.0383 -0.3979 | |||
54 100 7 0.998 1.1332 -0.5519 | |||
55 100 8 0.993 0.9258 -0.0688 | |||
56 100 19 0.992 0.9138 0.0326 | |||
57 100 5 0.994 0.9209 -0.2679 | |||
58 100 22 0.996 0.8591 0.6813 | |||
59 100 7 0.650 1.5471 -0.0597 | |||
60 100 21 0.995 0.9013 0.0722 | |||
61 100 16 0.998 0.8689 0.4326 | |||
62 100 1 0.002 0.7717 0.0000 | |||
63 100 6 0.995 0.9921 0.0243 | |||
64 100 14 0.998 0.9398 -0.5243 | |||
65 100 12 0.515 1.7489 -0.0858 | |||
66 100 17 0.999 0.9457 0.0390 | |||
67 100 9 0.840 0.7706 0.5165 | |||
68 100 6 0.969 0.9477 0.0164 | |||
69 100 12 0.999 0.9503 -0.1039 | |||
70 100 10 0.949 0.8026 -0.1336 | |||
71 100 4 0.689 2.0681 0.0039 | |||
72 100 29 0.999 1.1291 -0.6696 | |||
73 100 5 -0.316 0.4326 0.0269 | |||
74 100 13 -0.233 1.4081 -0.0231 | |||
75 100 21 0.991 0.9722 -0.0179 | |||
76 100 27 0.996 0.9971 -0.7051 | |||
77 100 26 0.090 0.9911 0.0042 | |||
78 100 33 0.999 1.0320 -0.1129 | |||
79 100 19 0.990 0.9761 -0.1856 | |||
80 100 9 -0.405 0.6967 0.0026 | |||
81 100 37 1.000 0.9449 -0.3532 | |||
82 100 39 0.998 0.9688 -0.3311 | |||
83 100 16 0.996 0.9339 0.3853 | |||
84 100 4 0.999 0.8844 0.1728 | |||
85 100 0 0.000 1.0000 0.0000 | |||
86 100 4 1.000 1.0431 -0.8447 | |||
87 100 20 0.998 0.9432 0.0283 | |||
88 100 16 0.999 0.9415 0.2914 | |||
89 100 39 0.995 0.9713 -0.2225 | |||
90 100 15 0.992 1.0039 0.0773 | |||
91 100 7 0.997 1.0149 -0.4369 | |||
92 100 15 0.713 0.9845 -0.0447 | |||
93 100 21 0.249 0.8322 -0.0360 | |||
94 100 34 0.997 0.9991 -0.1059 | |||
95 100 6 0.582 0.6511 0.1327 | |||
96 100 8 0.988 0.8068 0.5740 | |||
97 100 16 0.989 0.9331 0.4112 | |||
98 100 13 0.974 0.9556 0.0624 | |||
99 100 15 0.400 0.5817 -0.0325 | |||
We note that there are many datasets with high correlation coefficients. We use some of those to generate the REFERENCE_DATA_SET for the second round - XSCALE.INP is now | |||
OUTPUT_FILE=../reference.ahkl | |||
INPUT_FILE=../xtal002/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal003/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal005/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal006/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal007/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal008/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal011/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal012/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal013/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal015/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal017/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal018/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal019/XDS_ASCII.HKL | |||
INPUT_FILE=../xtal100/XDS_ASCII.HKL | |||
we could have included more datasets but it's pretty clear that these 14 already provide a completeness of 34.5% : | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.05 111 92 304 30.3% 3.1% 4.2% 34 17.02 4.1% 3.7% 0% 0.000 0 | |||
5.69 198 161 515 31.3% 3.5% 3.4% 70 16.78 4.8% 3.6% 0% 0.000 0 | |||
4.65 289 230 639 36.0% 3.2% 3.5% 109 16.77 4.4% 3.8% 0% 0.000 0 | |||
4.03 354 267 753 35.5% 3.4% 3.6% 151 18.70 4.5% 3.1% -40% 1.012 2 | |||
3.60 367 287 840 34.2% 2.4% 3.6% 147 17.35 3.2% 3.1% 0% 0.000 0 | |||
3.29 408 326 919 35.5% 3.7% 3.6% 158 16.91 5.1% 4.0% 0% 0.000 0 | |||
3.04 422 324 987 32.8% 3.8% 3.9% 180 14.95 5.1% 4.0% 0% 0.000 0 | |||
2.85 498 387 1066 36.3% 5.2% 4.6% 212 12.72 7.1% 7.3% 0% 0.000 0 | |||
2.68 523 402 1124 35.8% 5.5% 5.4% 219 11.28 7.4% 7.2% 0% 0.000 0 | |||
2.55 512 399 1174 34.0% 5.8% 6.0% 210 9.98 7.9% 7.6% 0% 0.000 0 | |||
2.43 558 426 1263 33.7% 8.7% 8.6% 237 8.37 11.7% 12.6% -100% 0.829 2 | |||
2.32 589 446 1287 34.7% 8.1% 9.0% 261 8.05 11.0% 14.0% 61% 0.690 3 | |||
2.23 621 470 1350 34.8% 9.6% 10.4% 276 7.52 12.9% 16.8% 0% 0.000 0 | |||
2.15 653 487 1380 35.3% 8.0% 8.8% 298 7.70 10.8% 13.5% -2% 0.783 6 | |||
2.08 624 493 1459 33.8% 11.6% 11.6% 247 6.57 16.0% 16.0% 0% 0.000 0 | |||
2.01 660 510 1494 34.1% 11.3% 11.5% 271 6.16 15.0% 16.7% -100% 0.382 2 | |||
1.95 697 535 1546 34.6% 13.1% 13.8% 295 5.34 17.7% 22.9% 0% 0.000 0 | |||
1.90 765 576 1571 36.7% 15.9% 16.3% 351 5.12 21.7% 23.9% 0% 0.000 0 | |||
1.85 751 563 1635 34.4% 21.7% 22.0% 339 3.80 29.3% 35.2% 0% 0.000 0 | |||
1.80 697 531 1660 32.0% 24.5% 25.5% 298 3.51 33.1% 40.5% -11% 0.784 2 | |||
total 10297 7912 22966 34.5% 5.6% 5.9% 4363 9.17 7.6% 11.5% -9% 0.741 24 | |||
Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP as the first time. The result is | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.05 794 270 304 88.8% 4.4% 4.2% 729 23.94 5.1% 3.0% 76% 1.884 48 | |||
5.69 1495 478 515 92.8% 4.6% 4.5% 1404 23.48 5.4% 3.3% 73% 1.633 80 | |||
4.65 1936 598 639 93.6% 5.4% 5.3% 1827 24.31 6.3% 3.7% 66% 1.541 133 | |||
4.03 2381 714 752 94.9% 4.5% 4.8% 2266 24.56 5.3% 3.2% 47% 1.157 151 | |||
3.60 2536 786 841 93.5% 5.5% 5.8% 2409 23.59 6.6% 3.9% 46% 1.164 173 | |||
3.29 2832 875 918 95.3% 5.5% 5.7% 2693 23.10 6.5% 3.8% 31% 1.013 189 | |||
3.04 3132 916 987 92.8% 5.7% 5.9% 3014 21.78 6.7% 3.8% 19% 0.917 228 | |||
2.85 3383 1014 1067 95.0% 7.1% 7.1% 3234 18.61 8.3% 5.7% 26% 0.963 233 | |||
2.68 3688 1079 1126 95.8% 8.3% 8.2% 3545 16.88 9.7% 6.9% 16% 0.911 270 | |||
2.55 3709 1109 1171 94.7% 9.6% 9.8% 3530 14.93 11.3% 8.5% 15% 0.855 252 | |||
2.43 4037 1194 1266 94.3% 10.8% 11.5% 3855 12.86 12.7% 11.1% 9% 0.805 287 | |||
2.32 4160 1217 1281 95.0% 11.7% 12.4% 3979 12.14 13.6% 10.1% 13% 0.886 312 | |||
2.23 4349 1286 1354 95.0% 12.1% 12.9% 4181 11.73 14.3% 13.5% 8% 0.738 317 | |||
2.15 4599 1324 1378 96.1% 13.6% 14.3% 4416 11.26 15.9% 12.9% 5% 0.841 341 | |||
2.08 4726 1379 1459 94.5% 15.5% 16.6% 4548 9.98 18.1% 14.6% -3% 0.784 352 | |||
2.01 4729 1419 1500 94.6% 15.6% 16.5% 4521 9.46 18.3% 16.4% 6% 0.818 338 | |||
1.95 4980 1480 1544 95.9% 20.3% 20.3% 4782 8.20 23.9% 21.1% -2% 0.778 353 | |||
1.90 5217 1511 1575 95.9% 22.7% 23.7% 5016 7.51 26.5% 23.6% -4% 0.740 391 | |||
1.85 5232 1555 1626 95.6% 29.8% 31.0% 5015 5.91 34.9% 28.6% 5% 0.813 359 | |||
1.80 5024 1511 1669 90.5% 33.5% 34.6% 4790 5.25 39.4% 36.9% -1% 0.767 347 | |||
total 72939 21715 22972 94.5% 8.2% 8.5% 69754 13.36 9.7% 10.3% 16% 0.891 5154 | |||
so the data are practically complete, and actually quite good. The anomalous signal suggests that it may be possible to solve the structure from its anomalous signal. | |||
We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl". | |||
Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, fix those manually. It was only xtal085 which made this necessary - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6. | |||
The final XSCALE.LP is then: | |||
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION | |||
RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano | |||
LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr | |||
8.05 804 276 316 87.3% 4.4% 4.2% 733 23.80 5.1% 3.1% 75% 1.899 49 | |||
5.69 1509 481 520 92.5% 4.5% 4.4% 1416 23.61 5.2% 3.3% 75% 1.660 81 | |||
4.65 1951 601 644 93.3% 4.3% 4.4% 1842 24.49 5.1% 3.3% 68% 1.579 134 | |||
4.03 2402 715 755 94.7% 4.1% 4.4% 2289 24.75 4.8% 3.2% 44% 1.174 153 | |||
3.60 2555 788 843 93.5% 4.0% 4.5% 2427 23.81 4.7% 3.2% 48% 1.169 179 | |||
3.29 2862 877 921 95.2% 4.2% 4.7% 2724 23.35 5.0% 3.2% 31% 1.050 198 | |||
3.04 3146 916 989 92.6% 5.0% 5.1% 3030 22.00 5.8% 4.0% 15% 0.897 231 | |||
2.85 3399 1016 1070 95.0% 5.9% 6.1% 3251 18.75 7.0% 5.4% 28% 0.992 235 | |||
2.68 3717 1081 1128 95.8% 7.2% 7.2% 3579 17.01 8.4% 7.1% 13% 0.883 274 | |||
2.55 3724 1110 1174 94.5% 8.3% 8.6% 3543 15.03 9.7% 8.0% 15% 0.836 255 | |||
2.43 4058 1196 1266 94.5% 9.9% 10.6% 3877 12.96 11.5% 10.3% 8% 0.811 291 | |||
2.32 4190 1220 1283 95.1% 11.1% 11.8% 4013 12.21 12.9% 10.8% 11% 0.889 328 | |||
2.23 4371 1288 1357 94.9% 11.5% 12.4% 4207 11.79 13.6% 12.6% 4% 0.757 318 | |||
2.15 4626 1324 1378 96.1% 13.2% 13.9% 4444 11.33 15.4% 12.4% 8% 0.835 349 | |||
2.08 4756 1383 1461 94.7% 15.2% 16.2% 4577 10.02 17.8% 14.2% -4% 0.771 356 | |||
2.01 4755 1423 1503 94.7% 15.4% 16.1% 4543 9.51 18.1% 15.2% 5% 0.817 342 | |||
1.95 4995 1480 1544 95.9% 20.1% 19.9% 4794 8.24 23.6% 20.2% -5% 0.787 359 | |||
1.90 5242 1512 1577 95.9% 22.3% 23.2% 5034 7.55 26.1% 22.2% -1% 0.772 400 | |||
1.85 5261 1552 1626 95.4% 29.6% 30.6% 5054 5.95 34.6% 28.3% 6% 0.828 365 | |||
1.80 5066 1514 1672 90.6% 33.4% 34.4% 4829 5.25 39.2% 35.7% -1% 0.789 356 | |||
total 73389 21753 23027 94.5% 7.4% 7.7% 70206 13.45 8.6% 9.8% 15% 0.898 5253 | |||
When inspecting the list of R-factors of each of the datasets it becomes clear that some of them are really good, but others are mediocre. | |||
== Optimizing the result == |
Revision as of 20:06, 12 March 2011
This is an exercise, devised by James Holton, which deals with merging of datasets that were obtained in the presence of strong radiation damage.
The datasets were actually simulated using his program MLFSOM. There are 100 of them, and they are in random orientations wrt each other. Each dataset consists of 15 frames of 1 degree rotation.
The goal of data processing is to obtain a good and complete dataset. In this case, it is tempting to think about the possibility of only using the first frame of each dataset. This has three advantages:
- radiation damage does not lower the resolution
- the completeness should be adequate if the symmetry is at least orthorhombic
- a successful procedure could also serve for processing data from a X-ray Free Electron Laser (see the recent Nature paper at [1])
Preparation
From visual inspection (using ADXV) we realize that the first frame of each dataset looks good (diffraction to 2 A), the last bad (10 A), and there is an obvious degradation from each frame to the next.
We have to get some idea about possible spacegroups first. This means processing some of the datasets. Let's choose "xtal100", the last one.
generate_XDS.INP "../../Illuin/microfocus/xtal100_1_0??.img"
To maximize the number of reflections that should be used for spacegroup determination, the only changes to XDS.INP are:
TEST_RESOLUTION_RANGE= 50 0 ! default is 10 4 ; we want all reflections instead DATA_RANGE= 1 1 ! R-factors involving more than 1 frame are meaningless ! with such strong radiation damage
We run "xds" and, after a few seconds, can inspect IDXREF.LP and CORRECT.LP. It turns out that the primitve cell is 38.3, 79.2, 79.2, 90, 90, 90 which is compatible with tetragonal spacegroups, or those with lower symmetry:
LATTICE- BRAVAIS- QUALITY UNIT CELL CONSTANTS (ANGSTROEM & DEGREES) REINDEXING TRANSFORMATION CHARACTER LATTICE OF FIT a b c alpha beta gamma * 31 aP 0.0 38.3 79.2 79.2 90.0 90.0 90.0 1 0 0 0 0 1 0 0 0 0 1 0 * 44 aP 0.1 38.3 79.2 79.2 90.0 90.0 90.0 -1 0 0 0 0 -1 0 0 0 0 1 0 * 35 mP 0.4 79.2 38.3 79.2 90.0 90.0 90.0 0 1 0 0 1 0 0 0 0 0 -1 0 * 33 mP 0.9 38.3 79.2 79.2 90.0 90.0 90.0 -1 0 0 0 0 -1 0 0 0 0 1 0 * 34 mP 1.1 38.3 79.2 79.2 90.0 90.0 90.0 1 0 0 0 0 0 -1 0 0 1 0 0 * 32 oP 1.2 38.3 79.2 79.2 90.0 90.0 90.0 -1 0 0 0 0 -1 0 0 0 0 1 0 * 20 mC 1.2 112.0 111.9 38.3 90.0 90.0 90.0 0 1 1 0 0 1 -1 0 -1 0 0 0 * 23 oC 1.4 111.9 112.0 38.3 90.0 90.0 90.0 0 -1 1 0 0 1 1 0 -1 0 0 0 * 25 mC 1.4 111.9 112.0 38.3 90.0 90.0 90.0 0 -1 1 0 0 1 1 0 -1 0 0 0 * 21 tP 2.2 79.2 79.2 38.3 90.0 90.0 90.0 0 -1 0 0 0 0 1 0 -1 0 0 0 37 mC 249.8 162.9 38.3 79.2 90.0 90.0 76.4 -1 0 2 0 -1 0 0 0 0 -1 0 0
This table exists in both IDXREF.LP and CORRECT.LP. The next table in CORRECT.LP tells us the Rmeas of the starred (*) lattices:
SPACE-GROUP UNIT CELL CONSTANTS UNIQUE Rmeas COMPARED LATTICE- NUMBER a b c alpha beta gamma CHARACTER 5 112.0 111.9 38.3 90.0 90.0 90.0 973 0.0 0 20 mC 75 79.2 79.2 38.3 90.0 90.0 90.0 961 93.5 12 21 tP 89 79.2 79.2 38.3 90.0 90.0 90.0 946 30.9 27 21 tP 21 111.9 112.0 38.3 90.0 90.0 90.0 965 31.6 8 23 oC 5 111.9 112.0 38.3 90.0 90.0 90.0 970 77.9 3 25 mC 1 38.3 79.2 79.2 90.0 90.0 90.0 973 0.0 0 31 aP 16 38.3 79.2 79.2 90.0 90.0 90.0 954 6.8 19 32 oP 3 79.2 38.3 79.2 90.0 90.0 90.0 968 5.4 5 35 mP 3 38.3 79.2 79.2 90.0 90.0 90.0 966 5.2 7 33 mP 3 38.3 79.2 79.2 90.0 90.0 90.0 966 10.7 7 34 mP 1 38.3 79.2 79.2 90.0 90.0 90.0 973 0.0 0 44 aP
Obviously the tetragonal lattices seem unfavourable, whereas orthorhombic is good. We repeat this procedure with a few other datasets, and observe that the "orthorhombic hypothesis" is confirmed. E.g. with xtal001 we obtain:
SPACE-GROUP UNIT CELL CONSTANTS UNIQUE Rmeas COMPARED LATTICE- NUMBER a b c alpha beta gamma CHARACTER 5 111.9 111.9 38.3 90.0 90.0 90.0 939 119.8 5 20 mC 75 79.1 79.1 38.3 90.0 90.0 90.0 939 47.0 5 21 tP 89 79.1 79.1 38.3 90.0 90.0 90.0 865 21.6 79 21 tP 21 111.9 111.9 38.3 90.0 90.0 90.0 939 119.8 5 23 oC 5 111.9 111.9 38.3 90.0 90.0 90.0 939 119.8 5 25 mC 1 38.3 79.1 79.1 90.0 90.0 90.0 944 0.0 0 31 aP 16 38.3 79.1 79.1 90.0 90.0 90.0 875 6.3 69 32 oP 3 79.1 38.3 79.1 90.0 90.0 90.0 944 0.0 0 35 mP 3 38.3 79.1 79.1 90.0 90.0 90.0 875 6.3 69 33 mP 3 38.3 79.1 79.1 90.0 90.0 90.0 944 0.0 0 34 mP 1 38.3 79.1 79.1 90.0 90.0 90.0 944 0.0 0 44 aP
devising a bootstrap procedure
We have to realize that, since the b and c axes are equal, we can index each dataset in two non-equivalent ways. This is the same situation as occurs e.g. for spacegroups P3(x) and P4(x), and means that we'll have to use a REFERENCE_DATA_SET to get the right setting for each of the 100 datasets.
However, we cannot expect that all of the datasets have enough reflections in common with a given dataset. Thus, we have to update and enlarge the REFERENCE_DATA_SET after the first round, using those datasets that have reflections in common with the old REFERENCE_DATA_SET. Then in a second round, we can hopefully identify the correct setting for all datasets. After that, we can scale everything together.
first round of bootstrap
We choose xtal100 as the first reference, and move its XDS_ASCII.HKL to bootstrap/reference.ahkl. A script that goes through all datasets, produces XDS.INP, and runs xds is the following (note that we only REFINE(IDXREF)= ORIENTATION BEAM , and the same for REFINE(INTEGRATE), since it may be useful to keep the b and c axis exactly the same):
#!/bin/csh -f foreach f ( Illuin/microfocus/xtal*_1_001.img ) setenv x `echo $f | cut -c 19-25` echo processing $x rm -rf bootstrap/$x mkdir bootstrap/$x cd bootstrap/$x cat>XDS.INP<<EOF JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT ORGX= 1511.2 ORGY= 1553.1 ! ORGX=1507 ORGY=1570 if BEAM is not refined DETECTOR_DISTANCE= 250 OSCILLATION_RANGE= 1 X-RAY_WAVELENGTH= 0.979338 NAME_TEMPLATE_OF_DATA_FRAMES=../../Illuin/microfocus/${x}_1_0??.img DATA_RANGE=1 1 SPOT_RANGE=1 1 REFERENCE_DATA_SET=../reference.ahkl TEST_RESOLUTION_RANGE= 50.0 2.0 ! for correlating with reference SPACE_GROUP_NUMBER=16 ! 0 if unknown UNIT_CELL_CONSTANTS= 38.3 79.1 79.1 90 90 90 ! mean of CORRECT outputs INCLUDE_RESOLUTION_RANGE=60 1.8 ! after CORRECT, insert high resol limit; re-run CORRECT TRUSTED_REGION=0.00 1. ! partially use corners of detectors; 1.41421=full use VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS=7000. 30000. ! often 8000 is ok MINIMUM_ZETA=0.05 ! integrate close to the Lorentz zone; 0.15 is default STRONG_PIXEL=5 MINIMUM_NUMBER_OF_PIXELS_IN_A_SPOT=3 ! default of 6 is sometimes too high REFINE(INTEGRATE)= ORIENTATION BEAM ! AXIS DISTANCE CELL REFINE(IDXREF)= ORIENTATION BEAM ! AXIS DISTANCE CELL ! parameters specifically for this detector and beamline: DETECTOR= ADSC MINIMUM_VALID_PIXEL_VALUE= 1 OVERLOAD= 65000 NX= 3072 NY= 3072 QX= 0.102539 QY= 0.102539 ! to make CORRECT happy if frames are unavailable DIRECTION_OF_DETECTOR_X-AXIS=1 0 0 DIRECTION_OF_DETECTOR_Y-AXIS=0 1 0 INCIDENT_BEAM_DIRECTION=0 0 1 ! 0.00203 -0.0065 1.02107 ! mean of CORRECT outputs ROTATION_AXIS=1 0 0 ! at e.g. SERCAT ID-22 this needs to be -1 0 0 FRACTION_OF_POLARIZATION=0.98 ! better value is provided by beamline staff! POLARIZATION_PLANE_NORMAL=0 1 0 EOF xds >& xds.log & sleep 1 cd ../.. end
Running this script takes 2 minutes. After this, it's a good idea to check whether the cell parameters are really what we assumed they are:
grep UNIT_CELL_CO xtal0[01]*/XDS_ASCII.HKL | cut -c24- > CELLPARM.INP cellparm cat CELLPARM.LP
and obtain:
A B C ALPHA BETA GAMMA WEIGHT 38.311 79.096 79.107 90.000 90.000 90.000 1.0 38.292 79.081 79.078 90.000 90.000 90.000 1.0 38.285 79.021 79.048 90.000 90.000 90.000 1.0 38.308 79.106 79.099 90.000 90.000 90.000 1.0 38.298 79.096 79.084 90.000 90.000 90.000 1.0 38.310 79.117 79.109 90.000 90.000 90.000 1.0 38.317 79.120 79.124 90.000 90.000 90.000 1.0 38.302 79.102 79.097 90.000 90.000 90.000 1.0 38.309 79.119 79.134 90.000 90.000 90.000 1.0 38.288 79.098 79.128 90.000 90.000 90.000 1.0 38.294 79.102 79.119 90.000 90.000 90.000 1.0 38.299 79.104 79.100 90.000 90.000 90.000 1.0 38.296 79.113 79.058 90.000 90.000 90.000 1.0 38.322 79.091 79.120 90.000 90.000 90.000 1.0 38.284 79.082 79.094 90.000 90.000 90.000 1.0 38.284 79.103 79.098 90.000 90.000 90.000 1.0 38.303 79.109 79.111 90.000 90.000 90.000 1.0 38.293 79.084 79.083 90.000 90.000 90.000 1.0 38.300 79.095 79.101 90.000 90.000 90.000 1.0
38.300 79.097 79.100 90.000 90.000 90.000 19.0
Why not use all datasets? The reason is that cellparm has a limit of 20 datasets!
Now we run xscale with the following XSCALE.INP :
UNIT_CELL_CONSTANTS=38.3 79.1 79.1 90 90 90 SPACE_GROUP_NUMBER=19 OUTPUT_FILE=temp.ahkl INPUT_FILE=../xtal001/XDS_ASCII.HKL INPUT_FILE=../xtal002/XDS_ASCII.HKL INPUT_FILE=../xtal003/XDS_ASCII.HKL INPUT_FILE=../xtal004/XDS_ASCII.HKL INPUT_FILE=../xtal005/XDS_ASCII.HKL INPUT_FILE=../xtal006/XDS_ASCII.HKL INPUT_FILE=../xtal007/XDS_ASCII.HKL INPUT_FILE=../xtal008/XDS_ASCII.HKL INPUT_FILE=../xtal009/XDS_ASCII.HKL INPUT_FILE=../xtal010/XDS_ASCII.HKL INPUT_FILE=../xtal011/XDS_ASCII.HKL INPUT_FILE=../xtal012/XDS_ASCII.HKL INPUT_FILE=../xtal013/XDS_ASCII.HKL INPUT_FILE=../xtal014/XDS_ASCII.HKL INPUT_FILE=../xtal015/XDS_ASCII.HKL INPUT_FILE=../xtal016/XDS_ASCII.HKL INPUT_FILE=../xtal017/XDS_ASCII.HKL INPUT_FILE=../xtal018/XDS_ASCII.HKL INPUT_FILE=../xtal019/XDS_ASCII.HKL INPUT_FILE=../xtal020/XDS_ASCII.HKL INPUT_FILE=../xtal021/XDS_ASCII.HKL INPUT_FILE=../xtal022/XDS_ASCII.HKL INPUT_FILE=../xtal023/XDS_ASCII.HKL INPUT_FILE=../xtal024/XDS_ASCII.HKL INPUT_FILE=../xtal025/XDS_ASCII.HKL INPUT_FILE=../xtal026/XDS_ASCII.HKL INPUT_FILE=../xtal027/XDS_ASCII.HKL INPUT_FILE=../xtal028/XDS_ASCII.HKL INPUT_FILE=../xtal029/XDS_ASCII.HKL INPUT_FILE=../xtal030/XDS_ASCII.HKL INPUT_FILE=../xtal031/XDS_ASCII.HKL INPUT_FILE=../xtal032/XDS_ASCII.HKL INPUT_FILE=../xtal033/XDS_ASCII.HKL INPUT_FILE=../xtal034/XDS_ASCII.HKL INPUT_FILE=../xtal035/XDS_ASCII.HKL INPUT_FILE=../xtal036/XDS_ASCII.HKL INPUT_FILE=../xtal037/XDS_ASCII.HKL INPUT_FILE=../xtal038/XDS_ASCII.HKL INPUT_FILE=../xtal039/XDS_ASCII.HKL INPUT_FILE=../xtal040/XDS_ASCII.HKL INPUT_FILE=../xtal041/XDS_ASCII.HKL INPUT_FILE=../xtal042/XDS_ASCII.HKL INPUT_FILE=../xtal043/XDS_ASCII.HKL INPUT_FILE=../xtal044/XDS_ASCII.HKL INPUT_FILE=../xtal045/XDS_ASCII.HKL INPUT_FILE=../xtal046/XDS_ASCII.HKL INPUT_FILE=../xtal047/XDS_ASCII.HKL INPUT_FILE=../xtal048/XDS_ASCII.HKL INPUT_FILE=../xtal049/XDS_ASCII.HKL INPUT_FILE=../xtal050/XDS_ASCII.HKL INPUT_FILE=../xtal051/XDS_ASCII.HKL INPUT_FILE=../xtal052/XDS_ASCII.HKL INPUT_FILE=../xtal053/XDS_ASCII.HKL INPUT_FILE=../xtal054/XDS_ASCII.HKL INPUT_FILE=../xtal055/XDS_ASCII.HKL INPUT_FILE=../xtal056/XDS_ASCII.HKL INPUT_FILE=../xtal057/XDS_ASCII.HKL INPUT_FILE=../xtal058/XDS_ASCII.HKL INPUT_FILE=../xtal059/XDS_ASCII.HKL INPUT_FILE=../xtal060/XDS_ASCII.HKL INPUT_FILE=../xtal061/XDS_ASCII.HKL INPUT_FILE=../xtal062/XDS_ASCII.HKL INPUT_FILE=../xtal063/XDS_ASCII.HKL INPUT_FILE=../xtal064/XDS_ASCII.HKL INPUT_FILE=../xtal065/XDS_ASCII.HKL INPUT_FILE=../xtal066/XDS_ASCII.HKL INPUT_FILE=../xtal067/XDS_ASCII.HKL INPUT_FILE=../xtal068/XDS_ASCII.HKL INPUT_FILE=../xtal069/XDS_ASCII.HKL INPUT_FILE=../xtal070/XDS_ASCII.HKL INPUT_FILE=../xtal071/XDS_ASCII.HKL INPUT_FILE=../xtal072/XDS_ASCII.HKL INPUT_FILE=../xtal073/XDS_ASCII.HKL INPUT_FILE=../xtal074/XDS_ASCII.HKL INPUT_FILE=../xtal075/XDS_ASCII.HKL INPUT_FILE=../xtal076/XDS_ASCII.HKL INPUT_FILE=../xtal077/XDS_ASCII.HKL INPUT_FILE=../xtal078/XDS_ASCII.HKL INPUT_FILE=../xtal079/XDS_ASCII.HKL INPUT_FILE=../xtal080/XDS_ASCII.HKL INPUT_FILE=../xtal081/XDS_ASCII.HKL INPUT_FILE=../xtal082/XDS_ASCII.HKL INPUT_FILE=../xtal083/XDS_ASCII.HKL INPUT_FILE=../xtal084/XDS_ASCII.HKL INPUT_FILE=../xtal085/XDS_ASCII.HKL INPUT_FILE=../xtal086/XDS_ASCII.HKL INPUT_FILE=../xtal087/XDS_ASCII.HKL INPUT_FILE=../xtal088/XDS_ASCII.HKL INPUT_FILE=../xtal089/XDS_ASCII.HKL INPUT_FILE=../xtal090/XDS_ASCII.HKL INPUT_FILE=../xtal091/XDS_ASCII.HKL INPUT_FILE=../xtal092/XDS_ASCII.HKL INPUT_FILE=../xtal093/XDS_ASCII.HKL INPUT_FILE=../xtal094/XDS_ASCII.HKL INPUT_FILE=../xtal095/XDS_ASCII.HKL INPUT_FILE=../xtal096/XDS_ASCII.HKL INPUT_FILE=../xtal097/XDS_ASCII.HKL INPUT_FILE=../xtal098/XDS_ASCII.HKL INPUT_FILE=../xtal099/XDS_ASCII.HKL INPUT_FILE=../xtal100/XDS_ASCII.HKL
xscale writes XSCALE.LP which has the 5050 correlation coefficients of every dataset with every other dataset! The order of listing of the correlation coefficients is such that it turns out that is was a good choice to have xtal100 as the REFERENCE_DATA_SET, because we find this list:
CORRELATIONS BETWEEN INPUT DATA SETS AFTER CORRECTIONS DATA SETS NUMBER OF COMMON CORRELATION RATIO OF COMMON B-FACTOR #i #j REFLECTIONS BETWEEN i,j INTENSITIES (i/j) BETWEEN i,j
with these 99 lines:
1 100 12 0.601 0.8200 0.0085 2 100 24 0.998 0.9001 0.5637 3 100 16 0.990 0.9216 -0.2983 4 100 16 0.239 1.9141 -0.2253 5 100 31 0.996 0.9231 0.3755 6 100 22 0.997 0.9412 0.2726 7 100 11 0.976 0.8848 -0.1225 8 100 5 0.967 0.9166 0.0435 9 100 34 0.160 1.2885 0.0774 10 100 11 0.860 2.9740 -0.2614 11 100 8 0.997 0.8732 0.6032 12 100 8 0.998 1.0145 -0.4169 13 100 22 1.000 0.9313 0.1664 14 100 8 0.900 0.8040 0.2744 15 100 10 0.986 0.9510 0.1738 16 100 1 0.000 0.9685 0.0000 17 100 14 0.991 0.8700 0.3395 18 100 7 0.997 1.0546 -0.2113 19 100 23 1.000 1.0451 -0.0246 20 100 24 0.266 0.6392 0.1091 21 100 20 0.995 0.8529 0.6281 22 100 12 0.072 0.9376 -0.0406 23 100 19 0.999 0.9366 0.0670 24 100 14 0.998 1.0986 -0.7853 25 100 4 0.939 1.0483 -0.0886 26 100 26 0.993 0.9633 0.0813 27 100 30 0.990 0.9782 -0.0191 28 100 30 0.995 0.9124 -0.0781 29 100 13 0.488 2.1279 -0.2548 30 100 18 0.283 1.2442 0.0585 31 100 23 0.995 0.9249 0.4751 32 100 22 0.293 2.7799 -0.1715 33 100 7 1.000 1.0706 -0.2011 34 100 6 0.987 0.9888 -0.0007 35 100 8 0.989 0.9895 -0.1751 36 100 23 0.985 0.8494 0.3038 37 100 8 0.966 0.7378 -0.0108 38 100 7 1.000 1.1335 -0.0927 39 100 11 0.982 0.9994 -0.5811 40 100 16 0.994 0.7549 0.8741 41 100 12 0.986 0.9478 -0.4168 42 100 11 0.994 0.8285 0.7668 43 100 9 0.997 0.9595 -0.2219 44 100 15 1.000 0.8666 0.2884 45 100 13 0.517 1.6433 0.0034 46 100 13 0.296 1.4431 -0.0938 47 100 18 0.857 0.9734 0.3337 48 100 13 0.999 0.9627 0.2611 49 100 22 0.991 0.8798 0.2976 50 100 14 0.999 1.1206 -1.0748 51 100 10 0.999 0.9296 0.5194 52 100 8 0.899 1.3901 0.0190 53 100 24 0.998 1.0383 -0.3979 54 100 7 0.998 1.1332 -0.5519 55 100 8 0.993 0.9258 -0.0688 56 100 19 0.992 0.9138 0.0326 57 100 5 0.994 0.9209 -0.2679 58 100 22 0.996 0.8591 0.6813 59 100 7 0.650 1.5471 -0.0597 60 100 21 0.995 0.9013 0.0722 61 100 16 0.998 0.8689 0.4326 62 100 1 0.002 0.7717 0.0000 63 100 6 0.995 0.9921 0.0243 64 100 14 0.998 0.9398 -0.5243 65 100 12 0.515 1.7489 -0.0858 66 100 17 0.999 0.9457 0.0390 67 100 9 0.840 0.7706 0.5165 68 100 6 0.969 0.9477 0.0164 69 100 12 0.999 0.9503 -0.1039 70 100 10 0.949 0.8026 -0.1336 71 100 4 0.689 2.0681 0.0039 72 100 29 0.999 1.1291 -0.6696 73 100 5 -0.316 0.4326 0.0269 74 100 13 -0.233 1.4081 -0.0231 75 100 21 0.991 0.9722 -0.0179 76 100 27 0.996 0.9971 -0.7051 77 100 26 0.090 0.9911 0.0042 78 100 33 0.999 1.0320 -0.1129 79 100 19 0.990 0.9761 -0.1856 80 100 9 -0.405 0.6967 0.0026 81 100 37 1.000 0.9449 -0.3532 82 100 39 0.998 0.9688 -0.3311 83 100 16 0.996 0.9339 0.3853 84 100 4 0.999 0.8844 0.1728 85 100 0 0.000 1.0000 0.0000 86 100 4 1.000 1.0431 -0.8447 87 100 20 0.998 0.9432 0.0283 88 100 16 0.999 0.9415 0.2914 89 100 39 0.995 0.9713 -0.2225 90 100 15 0.992 1.0039 0.0773 91 100 7 0.997 1.0149 -0.4369 92 100 15 0.713 0.9845 -0.0447 93 100 21 0.249 0.8322 -0.0360 94 100 34 0.997 0.9991 -0.1059 95 100 6 0.582 0.6511 0.1327 96 100 8 0.988 0.8068 0.5740 97 100 16 0.989 0.9331 0.4112 98 100 13 0.974 0.9556 0.0624 99 100 15 0.400 0.5817 -0.0325
We note that there are many datasets with high correlation coefficients. We use some of those to generate the REFERENCE_DATA_SET for the second round - XSCALE.INP is now
OUTPUT_FILE=../reference.ahkl INPUT_FILE=../xtal002/XDS_ASCII.HKL INPUT_FILE=../xtal003/XDS_ASCII.HKL INPUT_FILE=../xtal005/XDS_ASCII.HKL INPUT_FILE=../xtal006/XDS_ASCII.HKL INPUT_FILE=../xtal007/XDS_ASCII.HKL INPUT_FILE=../xtal008/XDS_ASCII.HKL INPUT_FILE=../xtal011/XDS_ASCII.HKL INPUT_FILE=../xtal012/XDS_ASCII.HKL INPUT_FILE=../xtal013/XDS_ASCII.HKL INPUT_FILE=../xtal015/XDS_ASCII.HKL INPUT_FILE=../xtal017/XDS_ASCII.HKL INPUT_FILE=../xtal018/XDS_ASCII.HKL INPUT_FILE=../xtal019/XDS_ASCII.HKL INPUT_FILE=../xtal100/XDS_ASCII.HKL
we could have included more datasets but it's pretty clear that these 14 already provide a completeness of 34.5% :
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr
8.05 111 92 304 30.3% 3.1% 4.2% 34 17.02 4.1% 3.7% 0% 0.000 0 5.69 198 161 515 31.3% 3.5% 3.4% 70 16.78 4.8% 3.6% 0% 0.000 0 4.65 289 230 639 36.0% 3.2% 3.5% 109 16.77 4.4% 3.8% 0% 0.000 0 4.03 354 267 753 35.5% 3.4% 3.6% 151 18.70 4.5% 3.1% -40% 1.012 2 3.60 367 287 840 34.2% 2.4% 3.6% 147 17.35 3.2% 3.1% 0% 0.000 0 3.29 408 326 919 35.5% 3.7% 3.6% 158 16.91 5.1% 4.0% 0% 0.000 0 3.04 422 324 987 32.8% 3.8% 3.9% 180 14.95 5.1% 4.0% 0% 0.000 0 2.85 498 387 1066 36.3% 5.2% 4.6% 212 12.72 7.1% 7.3% 0% 0.000 0 2.68 523 402 1124 35.8% 5.5% 5.4% 219 11.28 7.4% 7.2% 0% 0.000 0 2.55 512 399 1174 34.0% 5.8% 6.0% 210 9.98 7.9% 7.6% 0% 0.000 0 2.43 558 426 1263 33.7% 8.7% 8.6% 237 8.37 11.7% 12.6% -100% 0.829 2 2.32 589 446 1287 34.7% 8.1% 9.0% 261 8.05 11.0% 14.0% 61% 0.690 3 2.23 621 470 1350 34.8% 9.6% 10.4% 276 7.52 12.9% 16.8% 0% 0.000 0 2.15 653 487 1380 35.3% 8.0% 8.8% 298 7.70 10.8% 13.5% -2% 0.783 6 2.08 624 493 1459 33.8% 11.6% 11.6% 247 6.57 16.0% 16.0% 0% 0.000 0 2.01 660 510 1494 34.1% 11.3% 11.5% 271 6.16 15.0% 16.7% -100% 0.382 2 1.95 697 535 1546 34.6% 13.1% 13.8% 295 5.34 17.7% 22.9% 0% 0.000 0 1.90 765 576 1571 36.7% 15.9% 16.3% 351 5.12 21.7% 23.9% 0% 0.000 0 1.85 751 563 1635 34.4% 21.7% 22.0% 339 3.80 29.3% 35.2% 0% 0.000 0 1.80 697 531 1660 32.0% 24.5% 25.5% 298 3.51 33.1% 40.5% -11% 0.784 2 total 10297 7912 22966 34.5% 5.6% 5.9% 4363 9.17 7.6% 11.5% -9% 0.741 24
Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP as the first time. The result is
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 8.05 794 270 304 88.8% 4.4% 4.2% 729 23.94 5.1% 3.0% 76% 1.884 48 5.69 1495 478 515 92.8% 4.6% 4.5% 1404 23.48 5.4% 3.3% 73% 1.633 80 4.65 1936 598 639 93.6% 5.4% 5.3% 1827 24.31 6.3% 3.7% 66% 1.541 133 4.03 2381 714 752 94.9% 4.5% 4.8% 2266 24.56 5.3% 3.2% 47% 1.157 151 3.60 2536 786 841 93.5% 5.5% 5.8% 2409 23.59 6.6% 3.9% 46% 1.164 173 3.29 2832 875 918 95.3% 5.5% 5.7% 2693 23.10 6.5% 3.8% 31% 1.013 189 3.04 3132 916 987 92.8% 5.7% 5.9% 3014 21.78 6.7% 3.8% 19% 0.917 228 2.85 3383 1014 1067 95.0% 7.1% 7.1% 3234 18.61 8.3% 5.7% 26% 0.963 233 2.68 3688 1079 1126 95.8% 8.3% 8.2% 3545 16.88 9.7% 6.9% 16% 0.911 270 2.55 3709 1109 1171 94.7% 9.6% 9.8% 3530 14.93 11.3% 8.5% 15% 0.855 252 2.43 4037 1194 1266 94.3% 10.8% 11.5% 3855 12.86 12.7% 11.1% 9% 0.805 287 2.32 4160 1217 1281 95.0% 11.7% 12.4% 3979 12.14 13.6% 10.1% 13% 0.886 312 2.23 4349 1286 1354 95.0% 12.1% 12.9% 4181 11.73 14.3% 13.5% 8% 0.738 317 2.15 4599 1324 1378 96.1% 13.6% 14.3% 4416 11.26 15.9% 12.9% 5% 0.841 341 2.08 4726 1379 1459 94.5% 15.5% 16.6% 4548 9.98 18.1% 14.6% -3% 0.784 352 2.01 4729 1419 1500 94.6% 15.6% 16.5% 4521 9.46 18.3% 16.4% 6% 0.818 338 1.95 4980 1480 1544 95.9% 20.3% 20.3% 4782 8.20 23.9% 21.1% -2% 0.778 353 1.90 5217 1511 1575 95.9% 22.7% 23.7% 5016 7.51 26.5% 23.6% -4% 0.740 391 1.85 5232 1555 1626 95.6% 29.8% 31.0% 5015 5.91 34.9% 28.6% 5% 0.813 359 1.80 5024 1511 1669 90.5% 33.5% 34.6% 4790 5.25 39.4% 36.9% -1% 0.767 347 total 72939 21715 22972 94.5% 8.2% 8.5% 69754 13.36 9.7% 10.3% 16% 0.891 5154
so the data are practically complete, and actually quite good. The anomalous signal suggests that it may be possible to solve the structure from its anomalous signal.
We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl".
Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, fix those manually. It was only xtal085 which made this necessary - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6.
The final XSCALE.LP is then:
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION RESOLUTION NUMBER OF REFLECTIONS COMPLETENESS R-FACTOR R-FACTOR COMPARED I/SIGMA R-meas Rmrgd-F Anomal SigAno Nano LIMIT OBSERVED UNIQUE POSSIBLE OF DATA observed expected Corr 8.05 804 276 316 87.3% 4.4% 4.2% 733 23.80 5.1% 3.1% 75% 1.899 49 5.69 1509 481 520 92.5% 4.5% 4.4% 1416 23.61 5.2% 3.3% 75% 1.660 81 4.65 1951 601 644 93.3% 4.3% 4.4% 1842 24.49 5.1% 3.3% 68% 1.579 134 4.03 2402 715 755 94.7% 4.1% 4.4% 2289 24.75 4.8% 3.2% 44% 1.174 153 3.60 2555 788 843 93.5% 4.0% 4.5% 2427 23.81 4.7% 3.2% 48% 1.169 179 3.29 2862 877 921 95.2% 4.2% 4.7% 2724 23.35 5.0% 3.2% 31% 1.050 198 3.04 3146 916 989 92.6% 5.0% 5.1% 3030 22.00 5.8% 4.0% 15% 0.897 231 2.85 3399 1016 1070 95.0% 5.9% 6.1% 3251 18.75 7.0% 5.4% 28% 0.992 235 2.68 3717 1081 1128 95.8% 7.2% 7.2% 3579 17.01 8.4% 7.1% 13% 0.883 274 2.55 3724 1110 1174 94.5% 8.3% 8.6% 3543 15.03 9.7% 8.0% 15% 0.836 255 2.43 4058 1196 1266 94.5% 9.9% 10.6% 3877 12.96 11.5% 10.3% 8% 0.811 291 2.32 4190 1220 1283 95.1% 11.1% 11.8% 4013 12.21 12.9% 10.8% 11% 0.889 328 2.23 4371 1288 1357 94.9% 11.5% 12.4% 4207 11.79 13.6% 12.6% 4% 0.757 318 2.15 4626 1324 1378 96.1% 13.2% 13.9% 4444 11.33 15.4% 12.4% 8% 0.835 349 2.08 4756 1383 1461 94.7% 15.2% 16.2% 4577 10.02 17.8% 14.2% -4% 0.771 356 2.01 4755 1423 1503 94.7% 15.4% 16.1% 4543 9.51 18.1% 15.2% 5% 0.817 342 1.95 4995 1480 1544 95.9% 20.1% 19.9% 4794 8.24 23.6% 20.2% -5% 0.787 359 1.90 5242 1512 1577 95.9% 22.3% 23.2% 5034 7.55 26.1% 22.2% -1% 0.772 400 1.85 5261 1552 1626 95.4% 29.6% 30.6% 5054 5.95 34.6% 28.3% 6% 0.828 365 1.80 5066 1514 1672 90.6% 33.4% 34.4% 4829 5.25 39.2% 35.7% -1% 0.789 356 total 73389 21753 23027 94.5% 7.4% 7.7% 70206 13.45 8.6% 9.8% 15% 0.898 5253
When inspecting the list of R-factors of each of the datasets it becomes clear that some of them are really good, but others are mediocre.