Solve a small-molecule structure: Difference between revisions

no edit summary
No edit summary
Line 1: Line 1:
The following is based on the experience of a protein crystallographer who one day obtained a small-molecule dataset and managed to solve and refine it without prior knowledge what the crystallized substance was. It was a very rewarding experience which is why it's written up here.
The following is based on the experience of a protein crystallographer who one day obtained a small-molecule dataset and managed to solve and refine it without prior knowledge what the crystallized substance was, and without experience in small-molecule crystallography. It was a very rewarding experience (see the figure at the bottom) which is why it's written up here.


This is just a case study. To understand things, one has to read http://shelx.uni-ac.gwdg.de/SHELX/shelx.pdf .
This is just a case study. To understand things, one has to read http://shelx.uni-ac.gwdg.de/SHELX/shelx.pdf .


== reduce the data with your favourite data processing software ==
== Reduce the data with your favourite data processing software ==
I use [[xds:Main_Page|XDS]]. The decision about the spacegroup has to be postponed, but it surely helps if the correct Laue group is employed during scaling. In the case considered here, the CORRECT step suggested P222 (XDS really only should suggest "222 point symmetry" because CORRECT does not look at systematic absences at this point).
I use [[xds:Main_Page|XDS]]. The decision about the spacegroup has to be postponed, but it surely helps if the correct Laue group is employed during scaling. In the case considered here, the CORRECT step suggested P222 (XDS really only should suggest "222 point symmetry" because CORRECT does not look at systematic absences at this point).


== convert the reflection file to HKLF 4 format (intensities!) ==
== Determine the spacegroup ==
The HKLF 4 format is what the SHELX programs read. I used [[xds:XDSCONV|XDSCONV]] and the following XDSCONV.INP:  
 
If there are different spacegroup possibilities then (downstream, in structure solution and refinement) we need to try all of them in turn, until we hit one that refines really satisfactorily (R-factor below, say, 5%) and gives a structure that makes sense.
 
=== use [[XPREP]] to find out possible spacegroups ===
 
First, convert the reflection file to HKLF 4 format (intensities!). The HKLF 4 format is what the SHELX programs read. I used [[xds:XDSCONV|XDSCONV]] and the following XDSCONV.INP:  
SPACE_GROUP_NUMBER=  1
UNIT_CELL_CONSTANTS=    14.433    28.704    8.488  90.000  90.000  90.000
  INPUT_FILE=XDS_ASCII.HKL
  INPUT_FILE=XDS_ASCII.HKL
  OUTPUT_FILE=temp.hkl
  OUTPUT_FILE=temp.hkl
It is important that - to preserve the full information about systematic absences, for use in [[XPREP]] - XDSCONV runs in spacegroup 1. This does not necessarily mean that CORRECT also has to run in spacegroup 1, because XDS_ASCII.HKL has all observations no matter in which spacegroup the CORRECT step runs. As long as the spacegroup used in the CORRECT step is primitive, this works nicely. But if some re-indexing between CORRECT's spacegroup and P1 is necessary (like in I, F, C, R) then it is probably safest to rather just run CORRECT in P1.


== first try: wrong spacegroup ==
answer the question concerning the cell axes, and then hit <Enter> several (about 6) times until the program suggests a list of spacegroups - this choice is going to be important. It may help to observe whether it's centrosymmetric or not, from the line: Mean |E*E-1| = 0.939 [expected .968 centrosym and .736 non-centrosym]. Fortunately there's only one spacegroup consistent with the data:
=== run [[XPREP]] to find out possible spacegroups ===
answer the question concerning the cell axes, and then hit <Enter> several times until the program suggests a list of spacegroups - this choice is going to be important. It helps a bit to observe earlier whether it's centrosymmetric or not, from the line: Mean |E*E-1| = 0.939 [expected .968 centrosym and .736 non-centrosym].  
<pre>
<pre>
SPACE GROUP DETERMINATION
Lattice exceptions:  P      A      B      C      I      F    Obv    Rev    All
N (total) =          0  28832  28824  28788  28823  43222  38376  38344  57564
N (int>3sigma) =      0  17961  18421  18158  17862  27270  24715  24627  36959
Mean intensity =    0.0  22.7  23.7  24.8  23.4  23.7  24.7  24.8  24.8
Mean int/sigma =    0.0    9.6  10.0    9.9    9.6    9.8  10.0  10.0  10.0
Crystal system O and Lattice type P selected
Mean |E*E-1| = 0.939 [expected .968 centrosym and .736 non-centrosym]
Chiral flag NOT set
Systematic absence exceptions:
Systematic absence exceptions:


         b--  c--  n--  21--  -c-  -a-  -n-  -21-  --a  --b  --n  --21  
         b--  c--  n--  21--  -c-  -a-  -n-  -21-  --a  --b  --n  --21  


N       938   938    0    0   411    0   411    0    0   237   237    0
N     1884  1884  1892    7   988  1014   992    28   545   541   534    72
N I>3s  706  706    0    0  304    0  304    0    0  203  203    0
N I>3s  706  706    0    0  304    0  304    0    0  203  203    0
<I>    50.0 50.0   0.0   0.0  43.1   0.0 43.1  0.0   0.0 56.6 56.6   0.0
<I>    25.2 25.2   0.5   0.0  18.2   0.4 18.1  0.4   0.4 25.0 25.4   0.4
<I/s> 14.1  14.1   0.0   0.0  15.2  0.0 15.2   0.0   0.0 16.4 16.4  0.0
<I/s>   7.3  7.3   0.5   0.2  6.0.5   6.6   0.0.4   7.4   7.6   0.4




Line 29: Line 54:
Option  Space Group  No.  Type  Axes  CSD  R(sym) N(eq)  Syst. Abs.  CFOM
Option  Space Group  No.  Type  Axes  CSD  R(sym) N(eq)  Syst. Abs.  CFOM


[A] P222           # 16  chiral  1    14  0.022  9725  0.0 / 10.7  11.72
[A] Pccn           # 56 centro  3  196 0.023 10123 0.5 /  6.6  2.23
[B] Pmm2          # 25  non-cen  1    9  0.022  9725  0.0 / 10.7  15.05
[C] Pmm2          # 25  non-cen  5    9  0.022  9725  0.0 / 10.7  15.05
[D] Pmm2          # 25  non-cen  3    9  0.022  9725  0.0 / 10.7  15.05
[E] Pmmm          # 47 centro   1    7  0.022  9725  0.0 / 10.7  13.52
[F] P222(1)        # 17  chiral  1    26  0.022  9725  0.0 / 10.7  8.76
[G] P222(1)        # 17  chiral  5    26  0.022  9725  0.0 / 10.7  8.76
[H] P222(1)        # 17  chiral   3   26  0.022   9725 0.0 / 10.7  8.76
[I] P2(1)2(1)2    # 18  chiral  1  359  0.022  9725  0.0 / 10.7  5.33
[J] P2(1)2(1)2    # 18  chiral  5  359  0.022  9725  0.0 / 10.7  5.33
[K] P2(1)2(1)2    # 18  chiral  3  359  0.022  9725  0.0 / 10.7  5.33
[L] P2(1)2(1)2(1) # 19 chiral  1  5917  0.022  9725  0.0 / 10.5.07
[M] Pmc2(1)        # 26  non-cen  3    20  0.022  9725  0.0 / 10.7  9.81
[N] Pmc2(1)        # 26  non-cen  4    20  0.022  9725  0.0 / 10.7  9.81
[O] Pmma          # 51  centro  1    14  0.022  9725  0.0 / 10.7  7.69
[P] Pmma          # 51  centro  6    14  0.022  9725  0.0 / 10.7  7.69
[R] Pma2          # 28  non-cen  1    1  0.022  9725  0.0 / 10.7  55.05
[S] Pma2          # 28  non-cen  6    1  0.022  9725  0.0 / 10.7  55.05
[T] Pmn2(1)        # 31  non-cen  2    53  0.022  9725  0.0 / 10.7  6.90
[U] Pmn2(1)        # 31  non-cen  5    53  0.022  9725  0.0 / 10.7  6.90
[V] Pmmn          # 59  centro  3    42  0.022  9725  0.0 / 10.7  3.35
[W] Pcc2          # 27  non-cen  3    2  0.022  9725  0.0 / 10.7  38.39
[X] Pccm          # 49  centro  3    1  0.022  9725  0.0 / 10.7  51.02
[Y] Pna2(1)        # 33  non-cen  1  903  0.022  9725  0.0 / 10.7  5.16
[Z] Pna2(1)        # 33  non-cen 6   903  0.022  9725  0.0 / 10.7  5.16
[0] Pnma          # 62  centro  1  894  0.022  9725  0.0 / 10.7  1.14
[1] Pnma          # 62  centro  894  0.022  9725  0.0 / 10.7  1.14
[2] Pccn          # 56  centro  3  196  0.022  9725  0.0 / 10.7  1.53


Option [1] chosen
Option [A] chosen
</pre>
</pre>
(The program chooses Option "1" (Pnma) by default, which later turns out to be wrong. How the correct spacegroup (Pccn) could be identified at this point, I don't know.)


After that, say "c" for "define unit-cell CONTENTS", and input a reasonable number of carbon atoms (I used C20). Get out of this menu with "E". Then, choose "f" for "set up shelxtl FILES". Then, answer the question "XM/SHELXD (M) or XS/SHELXS (S) format [S]:" with "m" since we're going to use shelxd for solving the structure. Answer the question about the name (I used the spacegroup number as I knew I would have to test several possibilities). Finally, "q"uit the program. The resulting 62.ins is:
After that, say "c" for "define unit-cell CONTENTS", and input a reasonable number of carbon atoms (I used C20). Get out of this menu with "E". Then, choose "f" for "set up shelxtl FILES". Then, answer the question "XM/SHELXD (M) or XS/SHELXS (S) format [S]:" with "m" since we're going to use shelxd for solving the structure. Answer the question about the name (I used the spacegroup number as I knew I would have to test several possibilities). Finally, "q"uit the program. This writes 56.ins :
  TITL 62 in Pnma
  TITL 56 in Pccn
  CELL 0.71073   8.4900 28.7000  14.4300 90.000  90.000  90.000
  CELL 0.71073 14.4330 28.7040  8.4880 90.000  90.000  90.000
  ZERR  11.00  0.0017   0.0057  0.0029   0.000  0.000  0.000
  ZERR  11.00  0.0029   0.0057  0.0017   0.000  0.000  0.000
  LATT  1
  LATT  1
  SYMM 0.5-X, -Y, 0.5+Z
  SYMM 0.5-X, 0.5-Y, Z
  SYMM -X, 0.5+Y, -Z
  SYMM -X, 0.5+Y, 0.5-Z
  SYMM 0.5+X, 0.5-Y, 0.5-Z
  SYMM 0.5+X, -Y, 0.5-Z
  SFAC C
  SFAC C
  UNIT 220
  UNIT 220
Line 79: Line 76:
  END
  END


=== solving the structure with [[SHELX C/D/E|SHELXD]] ===
== Solve the structure with [[SHELX C/D/E|SHELXD]] ==
Just run "shelxd 62". You may interrupt it with Ctrl-C once it has found good solutions, as suggested by
Just run "shelxd 56". You may interrupt it with Ctrl-C once it has found a good solution, as suggested by
  Try 68:20  Peaks 99 96 71 68 63 55 53 51 50 48 46 45 45 44 44 43 43 43 41 40
  Try 11:20  Peaks 99 92 87 87 87 83 77 73 71 70 68 68 64 64 64 63 62 62 61 60
  R = 0.417, Min.fun. = 0.853, <cos> = 0.364, Ra = 0.432
  R = 0.294, Min.fun. = 0.747, <cos> = 0.491, Ra = 0.235
  Try    68, CC All/Weak 40.17 / 25.34, best 40.17 / 25.60, best final CC  0.00
  Try    11, CC All/Weak 59.81 / 46.01, best 59.81 / 46.01, best final CC  0.00
  Peaklist optimization cycle  1    CC = 46.01 %    BG = 0.638   for  21 atoms
  Peaklist optimization cycle  1    CC = 77.51 %    BG = 0.322   for  22 atoms
  Peaks: 99 91 66 63 63 55 54 49 49 45 43 43 42 42 41 39 20 18 18 17 17 -17 
  Peaks: 99 90 87 85 82 77 75 74 66 64 64 64 63 63 62 57 39 39 36 36 33 31   
  Fragments: 7 4 4 3 1 1 1                                                   
  Fragments: 17 5                                                             
  Peaklist optimization cycle  2    CC = 52.55 %    BG = 0.593   for  25 atoms
  Peaklist optimization cycle  2    CC = 88.80 %    BG = 0.225   for  25 atoms
  Peaks: 99 94 85 74 73 73 70 70 66 64 64 63 62 62 61 60 60 60 59 59 57 24 -24
  Peaks: 99 95 89 88 87 84 82 79 78 78 77 76 75 75 74 73 73 71 71 69 67 65 40
  Fragments: 14 5 4 1 1                                                       
  Fragments: 25                                                               
  Peaklist optimization cycle  3    CC = 58.37 %    BG = 0.541   for  29 atoms
  Peaklist optimization cycle  3    CC = 88.85 %    BG = 0.223   for  25 atoms
  Peaks: 99 92 85 72 72 70 69 66 65 63 63 62 61 60 59 59 59 -58 58 58 57 57 56
  Peaks: 99 96 89 87 86 86 82 79 79 76 76 75 75 75 73 73 72 71 69 69 67 65 63  
  Fragments: 17 7 4 1                                                         
  Fragments: 25                                                               


and the resulting 62.res is:
The resulting 56.res is:
<pre>
<pre>
REM TRY    77   FINAL CC 58.70   TIME      5 SECS
REM TRY    23   FINAL CC 88.85   TIME      3 SECS
REM Fragments: 17 7 3 2 2
REM Fragments: 25
REM  
REM  
TITL 62 in Pnma
TITL 56 in Pccn
CELL 0.71073   8.4900 28.7000  14.4300 90.000  90.000  90.000
CELL 0.71073 14.4330 28.7040  8.4880 90.000  90.000  90.000
ZERR  11.00  0.0017   0.0057  0.0029   0.000  0.000  0.000
ZERR  11.00  0.0029   0.0057  0.0017   0.000  0.000  0.000
LATT  1
LATT  1
SYMM 0.5-X, -Y, 0.5+Z
SYMM 0.5-X, 0.5-Y, Z
SYMM -X, 0.5+Y, -Z
SYMM -X, 0.5+Y, 0.5-Z
SYMM 0.5+X, 0.5-Y, 0.5-Z
SYMM 0.5+X, -Y, 0.5-Z
SFAC C
SFAC C
UNIT 220
UNIT 220
C001  1  0.15479 0.75000 0.04294 10.50000 0.1  99.00
C001  1  0.45835 0.41566 0.09083 11.00000 0.1  99.00
C002  1  0.84807 0.75000 -0.04054 10.50000 0.1  77.20
C002  1  0.36894 0.55007 -0.58932 11.00000 0.1  95.84
C003  1  0.19291 0.85742 -0.17716 11.00000 0.1  63.76
C003  1  0.52129 0.72099 -0.95623 11.00000 0.1  89.35
C004  1  0.59349 0.82735 0.20939 11.00000 0.1  62.64
C004  1  0.67521 0.30725 0.04587 11.00000 0.1  87.55
C005  1  0.84406 0.88664  0.13204 11.00000 0.1  61.71
C005  1  0.40328 0.54911 -0.45947 11.00000 0.1  85.96
C006  1  0.23705  0.75000 -0.14287 10.50000 0.1  61.63
...
...
C026 1  0.72766 0.95461 -0.10200 11.00000 0.1  51.40
C021 1  0.60567 0.70055 -0.97749 11.00000 0.1  66.94
C027 1  0.77380 0.96500  0.10122 11.00000 0.1  49.11
C022 1  0.49503 0.62079 -0.48787 11.00000 0.1  64.91
C028 1  0.39642 0.72972  0.17524 11.00000 0.1  24.76
C023 1  0.60066 0.62034 -0.48599 11.00000 0.1  63.62
C029 1  0.66918 0.82482 0.13969 11.00000 0.1  21.87
C024 1  0.63251 0.26331 0.06189 11.00000 0.1  63.01
C030 1  0.28518 0.73520  0.01792 11.00000 0.1   21.39
C025 1  0.47217 0.73227 -1.09548 11.00000 0.1  61.79
C031  1  0.40533  0.78770  0.08494 11.00000 0.1  19.94
HKLF 4
HKLF 4
END
END  
</pre>
</pre>


=== Refinement in [[SHELXL]] ===
== Refine using [[SHELXL]] ==
In this example, it makes sense to remove the last 4 atoms since their occupancy is less than 25% of the maximum, and the final remaining atom then is at 49% - a large jump.
 
Insert
Copy 56.res to 56.ins. Insert
  ACTA
  ACTA
  LIST 6
  LIST 6
  L.S. 10
  L.S. 10
after the UNIT 220 instruction, and run "shelxl 62".
after the UNIT 220 instruction, and run "shelxl 56". This gives a first refined model, and its electron density map, plus the relevant statistics.
 
It turns out that the R-factor does not really go down properly, and this means that the spacegroup is wrong. The "FINAL CC 58.70" result from SHELXD is probably also suspiciously low, I guess.
 
== structure solution and refinement in the correct spacegroup ==
 
We have to go back to XPREP and try a different spacegroup. This time I use Option "2" which means "Pccn" (number 56). SHELXD (finding 25 atoms, with a "FINAL CC 88.86") and SHELXL are run in the same way as above, but this time the R1 goes down to something above 10%, which indicates that this is probably a solution.


=== general idea of refining a structure ===
=== general idea of refining a structure ===
Line 158: Line 147:
For the H atoms, we just cut-and-paste the atoms from the bottom of the .res file into those lines where the other atoms are, if the distances to existing (heavy) atoms are close to 1 A.
For the H atoms, we just cut-and-paste the atoms from the bottom of the .res file into those lines where the other atoms are, if the distances to existing (heavy) atoms are close to 1 A.


== Finishing the structure ==
=== Finishing the structure ===


Finally we switch to anisotropic refinement by putting an  
Finally we switch to anisotropic refinement by putting an  
1,330

edits