Simulated-1g1c: Difference between revisions

no edit summary
No edit summary
No edit summary
Line 156: Line 156:
     38.300    79.097    79.100    90.000    90.000    90.000      19.0
     38.300    79.097    79.100    90.000    90.000    90.000      19.0


Why not use all datasets? The reason is that cellparm has a limit of 20 datasets!
Why not use all datasets? The reason is that cellparm has a limit of 20 datasets! But it seems to confirm that the cell axes are really 38.3, 79.1, 79.1.


Now we run xscale with the following XSCALE.INP :
Now we run xscale with the following XSCALE.INP :


<pre>
<pre>
UNIT_CELL_CONSTANTS=38.3 79.1 79.1  90 90 90
SPACE_GROUP_NUMBER=19
OUTPUT_FILE=temp.ahkl
OUTPUT_FILE=temp.ahkl


Line 273: Line 271:
  DATA SETS  NUMBER OF COMMON  CORRELATION  RATIO OF COMMON  B-FACTOR
  DATA SETS  NUMBER OF COMMON  CORRELATION  RATIO OF COMMON  B-FACTOR
   #i  #j    REFLECTIONS    BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j
   #i  #j    REFLECTIONS    BETWEEN i,j  INTENSITIES (i/j)  BETWEEN i,j
with these 99 lines:
with these final 99 lines:
     1  100          12          0.601            0.8200        0.0085
     1  100          12          0.601            0.8200        0.0085
     2  100          24          0.998            0.9001        0.5637
     2  100          24          0.998            0.9001        0.5637
Line 420: Line 418:
     total      10297    7912    22966      34.5%      5.6%      5.9%    4363    9.17    7.6%    11.5%    -9%  0.741      24
     total      10297    7912    22966      34.5%      5.6%      5.9%    4363    9.17    7.6%    11.5%    -9%  0.741      24


Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP as the first time. The result is
== second round of bootstrap ==
 
Now we are ready to run our script "bootstrap.rc" a second time. Actually it would be enough to run the CORRECT step but since it only takes 2 minutes we don't bother to change the script. After this, we run xscale a third time, using the same XSCALE.INP (with all its 100 INPUT_FILE= lines) as the first time. The result is


  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
  SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
Line 450: Line 450:
so the data are practically complete, and actually quite good. The anomalous signal suggests that it may be possible to solve the structure from its anomalous signal.
so the data are practically complete, and actually quite good. The anomalous signal suggests that it may be possible to solve the structure from its anomalous signal.


We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl".
We can find out the correct spacegroup (19 !) with "pointless xdsin temp.ahkl", and adjust our script accordingly.


Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, fix those manually. It was only xtal085 which made this necessary - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6.
Now we do another round, since the completeness is so good. We can then identify those few datasets which are still not indexed in the right setting, and fix those manually. It was only xtal085 which had a problem - it turned out that the indexing had not found the correct lattice, which was fixed with STRONG_PIXEL=6.


The final XSCALE.LP is then:
The final XSCALE.LP is then:
Line 485: Line 485:


== Optimizing the result ==
== Optimizing the result ==
One method to improve XDS' knowledge of geometry would be to use all 15 frames for indexing, but still only to integrate frame 1. This is easily accomplished by changing in the script:
JOB=XYCORR INIT COLSPOT IDXREF DEFPIX
DATA_RANGE=1 15
SPOT_RANGE=1 15
and to use, instead of "xds >& xds.log &" the line "../../run_xds.rc &" with the following run_xds.rc :
<pre>
#!/bin/csh -f
xds
egrep -v 'DATA_RANGE|JOB' XDS.INP >x
echo JOB=INTEGRATE CORRECT >XDS.INP
echo DATA_RANGE=1 1 >> XDS.INP
cat x >> XDS.INP
xds
</pre>
Furthermore it seems good to change "sleep 1" to "sleep 5" because now each COLSPOT has to look at 15 frames, not one. Thus, this takes a little bit longer. Indeed the result is a bit better:
WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  Rmrgd-F  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.05        798    274      304      90.1%      4.4%      4.2%      726  23.88    5.2%    3.1%    71%  1.932      49
    5.69        1514    480      515      93.2%      4.5%      4.5%    1421  23.66    5.3%    3.4%    76%  1.670      83
    4.65        1951    599      639      93.7%      4.3%      4.4%    1845  24.57    5.0%    3.3%    67%  1.561    139
    4.03        2399    713      753      94.7%      4.1%      4.5%    2289  24.76    4.8%    3.1%    44%  1.176    154
    3.60        2546    786      840      93.6%      3.9%      4.5%    2417  23.78    4.6%    3.1%    46%  1.127    175
    3.29        2864    876      919      95.3%      4.2%      4.7%    2729  23.35    4.9%    3.2%    38%  1.018    199
    3.04        3154    918      987      93.0%      5.0%      5.2%    3037  21.98    5.8%    3.9%    18%  0.922    231
    2.85        3387    1015      1066      95.2%      5.9%      6.1%    3235  18.74    7.0%    5.2%    26%  0.992    235
    2.68        3724    1082      1126      96.1%      7.2%      7.2%    3583  17.03    8.4%    6.7%    15%  0.890    278
    2.55        3720    1111      1172      94.8%      8.3%      8.6%    3536  15.02    9.7%    8.1%    14%  0.857    255
    2.43        4079    1198      1267      94.6%      9.8%    10.6%    3898  12.96    11.5%    10.3%    9%  0.781    290
    2.32        4199    1221      1283      95.2%      11.1%    11.7%    4024  12.21    12.9%    10.8%    12%  0.911    331
    2.23        4365    1282      1350      95.0%      11.4%    12.2%    4205  11.87    13.4%    12.6%    3%  0.729    319
    2.15        4651    1332      1386      96.1%      13.3%    13.9%    4468  11.30    15.5%    12.5%    5%  0.821    354
    2.08        4745    1380      1455      94.8%      15.0%    16.0%    4569  10.04    17.6%    14.0%    -1%  0.760    358
    2.01        4744    1418      1496      94.8%      15.4%    16.0%    4531    9.50    18.1%    16.3%    5%  0.820    343
    1.95        5019    1487      1550      95.9%      19.6%    19.7%    4813    8.27    23.0%    19.7%    -1%  0.765    359
    1.90        5210    1504      1571      95.7%      21.9%    22.9%    5007    7.53    25.6%    22.8%    -6%  0.740    399
    1.85        5272    1561      1633      95.6%      29.1%    30.1%    5054    5.98    34.1%    28.8%    4%  0.801    366
    1.80        5054    1505      1659      90.7%      33.2%    34.1%    4822    5.25    38.9%    35.2%    -1%  0.790    354
    total      73395  21742    22971      94.6%      7.3%      7.7%    70209  13.46    8.6%    9.8%    16%  0.890    5271
but there does not appear a "magic bullet" that would produce much better data than with the quick bootstrap approach.
== Solving the structure ==
First, we repeat xscale after inserting FRIEDEL'S_LAW=FALSE into XSCALE.INP . This gives us
      NOTE:      Friedel pairs are treated as different reflections.
SUBSET OF INTENSITY DATA WITH SIGNAL/NOISE >= -3.0 AS FUNCTION OF RESOLUTION
RESOLUTION    NUMBER OF REFLECTIONS    COMPLETENESS R-FACTOR  R-FACTOR COMPARED I/SIGMA  R-meas  Rmrgd-F  Anomal  SigAno  Nano
  LIMIT    OBSERVED  UNIQUE  POSSIBLE    OF DATA  observed  expected                                      Corr
    8.05        804    382      476      80.3%      3.1%      3.4%      665  24.13    3.9%    2.7%    81%  2.507      50
    5.69        1527    723      882      82.0%      3.4%      3.6%    1251  22.48    4.2%    3.1%    85%  2.223      87
    4.65        1956    938      1136      82.6%      3.4%      3.6%    1602  22.73    4.3%    3.0%    72%  1.821    141
    4.03        2400    1136      1357      83.7%      3.5%      3.6%    1943  22.62    4.4%    3.2%    46%  1.347    154
    3.60        2549    1261      1533      82.3%      3.4%      3.7%    2053  21.53    4.3%    3.3%    51%  1.322    176
    3.29        2867    1393      1694      82.2%      3.7%      3.9%    2347  21.22    4.7%    3.5%    35%  1.159    199
    3.04        3154    1507      1830      82.3%      4.5%      4.3%    2607  19.33    5.7%    4.5%    17%  1.016    231
    2.85        3389    1649      1979      83.3%      5.3%      5.2%    2761  16.37    6.7%    6.0%    27%  1.054    235
    2.68        3724    1757      2104      83.5%      6.5%      6.1%    3088  14.63    8.1%    7.8%    15%  0.962    278
    2.55        3720    1813      2197      82.5%      7.3%      7.6%    2999  12.84    9.2%    9.1%    16%  0.896    255
    2.43        4079    1933      2384      81.1%      9.0%      9.5%    3352  11.01    11.3%    12.5%    9%  0.840    290
    2.32        4199    2006      2420      82.9%      10.0%    10.5%    3474  10.17    12.7%    13.8%    14%  0.939    331
    2.23        4363    2099      2551      82.3%      10.6%    11.0%    3595    9.91    13.4%    14.5%    5%  0.790    319
    2.15        4651    2203      2634      83.6%      12.2%    12.5%    3827    9.29    15.3%    15.7%    7%  0.856    354
    2.08        4745    2248      2758      81.5%      14.2%    14.7%    3945    8.32    18.0%    18.7%    -2%  0.822    358
    2.01        4744    2287      2843      80.4%      14.3%    14.6%    3896    7.92    18.1%    19.2%    7%  0.868    343
    1.95        5019    2429      2945      82.5%      18.5%    18.3%    4079    6.76    23.3%    24.6%    0%  0.789    359
    1.90        5210    2484      3000      82.8%      20.4%    21.0%    4282    6.06    25.6%    27.9%    -4%  0.757    399
    1.85        5272    2569      3119      82.4%      27.8%    28.0%    4272    4.77    35.0%    36.5%    4%  0.803    366
    1.80        5054    2451      3171      77.3%      30.9%    31.1%    4092    4.20    39.0%    43.1%    -3%  0.788    354
    total      73426  35268    43013      82.0%      6.5%      6.7%    60130  11.57    8.2%    11.7%    20%  0.963    5279
One hint towards the contents of the "crystal" is that the information about the simulated data contained the strings "1g1c". This structure is solved (in a different spacegroup!) and can be found in the PDB; it contains 99 residues, among which there are 2 Cys and 2 Met. Thus we assume that the simulated data represent sulfur-SAD. Using [[ccp4:hkl2map|hkl2map]], we can easily find four sites with good CCall/CCweak:
shelxe.beta -m50 -a -q -h -s0.77 -b -i 1g1c 1g1c_fa
2,652

edits