Simulated-1g1c: Difference between revisions

Simulated-1g1c (view source)

Revision as of 22:35, 12 March 2011

7,746 bytes added , 12 March 2011

no edit summary

Kay

Bureaucrats

2,652

edits

@@ Line 156: / Line 156: @@
 .300    79.097    79.100    90.000    90.000    90.000      19.0
-Why not use all datasets? The reason is that cellparm has a limit of 20 datasets!
+Why not use all datasets? The reason is that cellparm has a limit of 20 datasets! But it seems to confirm that the cell axes are really 38.3, 79.1, 79.1.
 Now we run xscale with the following XSCALE.INP :
 <pre>
-UNIT_CELL_CONSTANTS=38.3 79.1 79.1  90 90 90
-SPACE_GROUP_NUMBER=19
 OUTPUT_FILE=temp.ahkl
@@ Line 273: / Line 271: @@
   DATA SETS  NUMBER OF COMMON  CORRELATION   RATIO OF COMMON   B-FACTOR
    #i   #j     REFLECTIONS     BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j
-with these 99 lines:
+with these final 99 lines:
   100          12           0.601            0.8200         0.0085
   100          24           0.998            0.9001         0.5637
@@ Line 420: / Line 418: @@
      total       10297    7912     22966       34.5%       5.6%      5.9%     4363    9.17     7.6%    11.5%    -9%   0.741      24
-Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP as the first time. The result is
+== second round of bootstrap ==
+Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP (with all its 100 INPUT_FILE= lines) as the first time. The result is
   SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
@@ Line 450: / Line 450: @@
 so the data are practically complete, and actually quite good. The anomalous signal suggests that it may be possible to solve the structure from its anomalous signal.
-We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl".
+We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl", and adjust our script accordingly.
-Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, fix those manually. It was only xtal085 which made this necessary - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6.
+Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, and fix those manually. It was only xtal085 which had a problem - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6.
 The final XSCALE.LP is then:
@@ Line 485: / Line 485: @@
 == Optimizing the result ==
+One method to improve XDS' knowledge of geometry would be to use all 15 frames for indexing, but still only to integrate frame 1. This is easily accomplished by changing in the script:
+ JOB=XYCORR INIT COLSPOT IDXREF DEFPIX
+ DATA_RANGE=1 15
+ SPOT_RANGE=1 15
+and to use, instead of "xds >& xds.log &" the line "../../run_xds.rc &" with the following run_xds.rc :
+<pre>
+#!/bin/csh -f
+xds
+egrep -v 'DATA_RANGE|JOB' XDS.INP >x
+echo JOB=INTEGRATE CORRECT >XDS.INP
+echo DATA_RANGE=1 1 >> XDS.INP
+cat x >> XDS.INP
+xds
+</pre>
+Furthermore it seems good to change "sleep 1" to "sleep 5" because now each COLSPOT has to look at 15 frames, not one. Thus, this takes a little bit longer. Indeed the result is a bit better:
+WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
+ RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
+   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr
+.05         798     274       304       90.1%       4.4%      4.2%      726   23.88     5.2%     3.1%    71%   1.932      49
+.69        1514     480       515       93.2%       4.5%      4.5%     1421   23.66     5.3%     3.4%    76%   1.670      83
+.65        1951     599       639       93.7%       4.3%      4.4%     1845   24.57     5.0%     3.3%    67%   1.561     139
+.03        2399     713       753       94.7%       4.1%      4.5%     2289   24.76     4.8%     3.1%    44%   1.176     154
+.60        2546     786       840       93.6%       3.9%      4.5%     2417   23.78     4.6%     3.1%    46%   1.127     175
+.29        2864     876       919       95.3%       4.2%      4.7%     2729   23.35     4.9%     3.2%    38%   1.018     199
+.04        3154     918       987       93.0%       5.0%      5.2%     3037   21.98     5.8%     3.9%    18%   0.922     231
+.85        3387    1015      1066       95.2%       5.9%      6.1%     3235   18.74     7.0%     5.2%    26%   0.992     235
+.68        3724    1082      1126       96.1%       7.2%      7.2%     3583   17.03     8.4%     6.7%    15%   0.890     278
+.55        3720    1111      1172       94.8%       8.3%      8.6%     3536   15.02     9.7%     8.1%    14%   0.857     255
+.43        4079    1198      1267       94.6%       9.8%     10.6%     3898   12.96    11.5%    10.3%     9%   0.781     290
+.32        4199    1221      1283       95.2%      11.1%     11.7%     4024   12.21    12.9%    10.8%    12%   0.911     331
+.23        4365    1282      1350       95.0%      11.4%     12.2%     4205   11.87    13.4%    12.6%     3%   0.729     319
+.15        4651    1332      1386       96.1%      13.3%     13.9%     4468   11.30    15.5%    12.5%     5%   0.821     354
+.08        4745    1380      1455       94.8%      15.0%     16.0%     4569   10.04    17.6%    14.0%    -1%   0.760     358
+.01        4744    1418      1496       94.8%      15.4%     16.0%     4531    9.50    18.1%    16.3%     5%   0.820     343
+.95        5019    1487      1550       95.9%      19.6%     19.7%     4813    8.27    23.0%    19.7%    -1%   0.765     359
+.90        5210    1504      1571       95.7%      21.9%     22.9%     5007    7.53    25.6%    22.8%    -6%   0.740     399
+.85        5272    1561      1633       95.6%      29.1%     30.1%     5054    5.98    34.1%    28.8%     4%   0.801     366
+.80        5054    1505      1659       90.7%      33.2%     34.1%     4822    5.25    38.9%    35.2%    -1%   0.790     354
+    total       73395   21742     22971       94.6%       7.3%      7.7%    70209   13.46     8.6%     9.8%    16%   0.890    5271
+but there does not appear a "magic bullet" that would produce much better data than with the quick bootstrap approach.
+== Solving the structure ==
+First, we repeat xscale after inserting FRIEDEL'S_LAW=FALSE into XSCALE.INP . This gives us
+       NOTE:      Friedel pairs are treated as different reflections.
+ SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
+ RESOLUTION     NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA   R-meas  Rmrgd-F  Anomal  SigAno   Nano
+   LIMIT     OBSERVED  UNIQUE  POSSIBLE     OF DATA   observed  expected                                      Corr
+.05         804     382       476       80.3%       3.1%      3.4%      665   24.13     3.9%     2.7%    81%   2.507      50
+.69        1527     723       882       82.0%       3.4%      3.6%     1251   22.48     4.2%     3.1%    85%   2.223      87
+.65        1956     938      1136       82.6%       3.4%      3.6%     1602   22.73     4.3%     3.0%    72%   1.821     141
+.03        2400    1136      1357       83.7%       3.5%      3.6%     1943   22.62     4.4%     3.2%    46%   1.347     154
+.60        2549    1261      1533       82.3%       3.4%      3.7%     2053   21.53     4.3%     3.3%    51%   1.322     176
+.29        2867    1393      1694       82.2%       3.7%      3.9%     2347   21.22     4.7%     3.5%    35%   1.159     199
+.04        3154    1507      1830       82.3%       4.5%      4.3%     2607   19.33     5.7%     4.5%    17%   1.016     231
+.85        3389    1649      1979       83.3%       5.3%      5.2%     2761   16.37     6.7%     6.0%    27%   1.054     235
+.68        3724    1757      2104       83.5%       6.5%      6.1%     3088   14.63     8.1%     7.8%    15%   0.962     278
+.55        3720    1813      2197       82.5%       7.3%      7.6%     2999   12.84     9.2%     9.1%    16%   0.896     255
+.43        4079    1933      2384       81.1%       9.0%      9.5%     3352   11.01    11.3%    12.5%     9%   0.840     290
+.32        4199    2006      2420       82.9%      10.0%     10.5%     3474   10.17    12.7%    13.8%    14%   0.939     331
+.23        4363    2099      2551       82.3%      10.6%     11.0%     3595    9.91    13.4%    14.5%     5%   0.790     319
+.15        4651    2203      2634       83.6%      12.2%     12.5%     3827    9.29    15.3%    15.7%     7%   0.856     354
+.08        4745    2248      2758       81.5%      14.2%     14.7%     3945    8.32    18.0%    18.7%    -2%   0.822     358
+.01        4744    2287      2843       80.4%      14.3%     14.6%     3896    7.92    18.1%    19.2%     7%   0.868     343
+.95        5019    2429      2945       82.5%      18.5%     18.3%     4079    6.76    23.3%    24.6%     0%   0.789     359
+.90        5210    2484      3000       82.8%      20.4%     21.0%     4282    6.06    25.6%    27.9%    -4%   0.757     399
+.85        5272    2569      3119       82.4%      27.8%     28.0%     4272    4.77    35.0%    36.5%     4%   0.803     366
+.80        5054    2451      3171       77.3%      30.9%     31.1%     4092    4.20    39.0%    43.1%    -3%   0.788     354
+    total       73426   35268     43013       82.0%       6.5%      6.7%    60130   11.57     8.2%    11.7%    20%   0.963    5279
+One hint towards the contents of the "crystal" is that the information about the simulated data contained the strings "1g1c". This structure is solved (in a different spacegroup!) and can be found in the PDB; it contains 99 residues, among which there are 2 Cys and 2 Met. Thus we assume that the simulated data represent sulfur-SAD. Using [[ccp4:hkl2map|hkl2map]], we can easily find four sites with good CCall/CCweak:
+ shelxe.beta -m50 -a -q -h -s0.77 -b -i 1g1c 1g1c_fa