SHELXL: Difference between revisions

8,543 bytes added ,  14 March 2008
no edit summary
No edit summary
No edit summary
Line 46: Line 46:
restrains N and CA of each amino-acid and O, CA and C of the  preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity).<br>
restrains N and CA of each amino-acid and O, CA and C of the  preceding residue to lie in a plane with a relatively large esd (0.3) (peptide planarity).<br>


 
The PRODRG server: http://davapc1.bioch.dundee.ac.uk/programs/prodrg/ is recommended for generating restraints in SHELX format for ligands etc; the "J" option in SHELXPRO can also be useful for this if a model is already available.


== Least-squares refinement algebra ==
== Least-squares refinement algebra ==
Line 57: Line 57:


== Full-matrix estimates of standard uncertainties ==
== Full-matrix estimates of standard uncertainties ==
Inversion of the full normal matrix (or of large matrix blocks, e.g., for all positional parameters) enables the precision of individual parameters to be estimated (Rollett, 1970), either with or without the inclusion of the restraints in the matrix. The standard uncertainties in dependent quantities (e.g., torsion angles or distances from mean planes) are calculated in SHELXL using the full least-squares correlation matrix. These standard uncertainties reflect the data-to-parameter ratio, i.e., the resolution and completeness of the data and the percentage of solvent, and the quality of the agreement between the observed and calculated F<sup>2</sup>-values (and the agreement of restrained quantities with their target values when restraints are included).<br>
If high resolution data are available (there must be appreciably more data than parameters!) and the structure is not too large, it may be possible to obtain rigorous esds by matrix inversion. The structure should first be refined to convergence with CGLS setting the second parameter to –1 to calculate Rfree, then a further refinement should be performed against all data by deleting the second CGLS parameter, and finally a single full-matrix cycle should be performed (‘L.S. 1’) with zero damping and a zero shift multiplier (‘DAMP 0 0’) in which all restraints have been removed. Often ‘BLOC 1’ will be used so that the (anisotropic) displacement parameters are fixed in this final cycle, which makes the matrix appreciably smaller and more stable on inversion, but still allows the estimation of realistic standard deviations on all geometrical parameters. BOND, RTAB, HTAB and MPLA instructions may be needed to define the dependent parameters for which esds are required, and the connectivity table used by BOND may need to have extra 'bonds' (e.g. to metal ions) added by BIND if they are not generated automatically (rare).<br>
Given high-resolution data (better than 1.5 &Aring;) all restraints should be removed: lines begining with SIMU, DELU, ISOR, BUMP, DFIX, DANG, CHIV, FLAT and NCSY should be deleted or preceded by "REM". Alternatively, one can determine standard uncertainties in the Bayesian sense that take all available knowledge into account by retaining all the restraints. This may be done at more modest resolution (say 2.5A or better). To obtain mean values and s.u. of e.g. distances or chiral volumes that occur several times in a structure, use DFIX or CHIV with "free variables".<br>
== Refinement of anisotropic displacement parameters ==
The motion of macromolecules is clearly anisotropic, but the data-to-parameter ratio rarely permits the refinement of the six independent anisotropic displacement parameters (ADPs) per atom; even for small-molecules and data-to-atomic resolution, the anisotropic refinement of disordered regions requires the use of restraints. SHELXL employs three types of ADP-restraint (Sheldrick 1993; Sheldrick & Schneider, 1997). The rigid bond restraint, first suggested by Rollett (1970), assumes that the components of the ADPs of two atoms connected via one (or two) chemical bonds are equal within a specified standard deviation. This has been shown to hold accurately (Hirshfeld, 1976; Trueblood & Dunitz, 1983) for precise structures of small-molecules, so it can be applied as a ’hard’ restraint with small estimated standard deviation. The similar ADP restraint assumes that atoms that are spatially close (but not necessarily bonded because they may be different components of a disordered group) have similar Uij components. An approximately isotropic restraint is useful for isolated solvent molecules. These two restraints are only approximate and so should be applied with low weights, i.e., high estimated standard deviations.<br>
The transition from isotropic to anisotropic roughly doubles the number of parameters and almost always results in an appreciable reduction in the R-factor. However, this represents an improvement in the model only when it is accompanied by a significant reduction in the free R-factor (Brünger, 1992). Since the free R-factor is itself subject to uncertainty because of the small sample used, a drop of at least 1% is needed to justify anisotropic refinement. There should also be a reduction in the goodness of fit, and the resulting thermal ellipsoids should make chemical sense and not be ‘non-positive-definite’!<br>
== Modeling disorder ==
There are many ways of modeling disorder using SHELXL, but for macromolecules the most convenient is to retain the same atom and residue names for the two or more components and assign a different ‘part number’ (analogous to the PDB alternative site flag) to each component. With this technique, no change is required to the input restraints, etc. Atoms in the same component will normally have a common occupancy that is assigned to a free variable (fv). The starting values for the free variables are given, in order, on the FVAR instruction; note that there is no free variable number 1 (adding 10 fixes a parameter); the first FVAR parameter is the overall scale factor. Residues Glu_12 and Cys_38 have disordered side-chains in the example; their occupancies are tied to fv(2) (for the atoms in component [PART] 1) and to 1-fv(2) for the atoms in component 2 for Glu_12, and similarly fv(4) and 1-fv(4) for Cys_38. This ensures that the sum of occupancies for both components is held at unity. ’21.0’ is interpreted as 1.0 times fv(2), and –21.0 as 1.0 times [1-fv(2)].
This notation is not very intuitive, but it is concise and very flexible. Free variables may also be used in DFIX and CHIV restraints. Thus ’CHIV_PRO 31 CA’ would cause the chiral volumes of all proline CA atoms to be restrained to free variable number 3, which itself is allowed to refine. In this way reasonable geometrical restraints can be applied even when the target values are unknown. By restraining distances to be equal to a free variable using DFIX, a standard deviation of the mean distance may be calculated rigorously using full-matrix least-squares algebra.
If there are three or more disorder components, then each of the common occupancies must be assigned to a separate free variable (e.g. as 51, 61 and 71), and their sum can be restrained to unity by the use of a SUMP restraint (e.g. ‘SUMP 1 0.01 1 5 1 6 1 7’).<br>
== Twinned crystals ==
SHELXL provides facilities for refining against data from merohedral, pseudo-merohedral, and non-merohedral twins (Herbst-Irmer & Sheldrick, 1998). Refinement against data from merohedrally twinned crystals is particularly straightforward, requiring only the twin law (a 3x3 matrix) and starting values for the volume fractions of the twin components. Failure to recognize such twinning not only results in high R-factors and poor quality maps, it can also lead to incorrect biochemical conclusions (Luecke, Richter & Lanyi, 1998). Twinning can often be detected by statistical tests (Yeates & Fam, 1999), and it is probably much more widespread in macromolecular crystals than is generally appreciated!
No changes are needed to the .hkl file for merohedral twinning, but the data should be merged in the lower of the two relevant Laue groups). For non-merohedral twinning a special (‘HKLF 5’) format is required.<br>
== Unstable refinements and other problems ==
However much care is taken in setting up a refinement, it can happen that the refinement becomes unstable and diverges. Usually the program detects this in time but in extreme cases, especially when full-matrix refinement is performed with a poorly conditioned matrix, it can crash. It is much more difficult to identify the cause of such problem when a large number of changes have been made in updating a .res file to the .ins file for the next job, so it is often more effective to improve the model in small steps. The .lst file contains a great deal of useful diagnostic information (which can be increased by using MORE 3); however the best place to start looking for problems is the list of ‘disagreeable restraints’; these often pinpoint the atoms or restraints that need changing. Also the presence of unrestrained atoms (which are commented on by the program) is a common cause of instability. In general, the more parameters that are refined, the less stable the refinement becomes; typical examples are the inclusion of dubious solvent water molecules or making all atoms anisotropic when there are not enough data.<br>
Anti-bumping restraints are very useful in maintaining a chemically sensible structure, especially at lower resolution, but can also set traps for the unwary. For example if  two atoms that should be bonded are too far apart for the program to include them automatically in the connectivity array, an anti-bumping restraint may be generated automatically to push them apart and this will fight against a DFIX or DANG restraint that is trying to bring them together! The remedy is to join the two atoms by hand so that they are bonded in the connectivity array, e.g.<br>
<b>BIND CB_23 CG_23</b><br>
Even if the side-chain of residue 23 in this example is disordered and the bond is only broken in one component, this will have the desired effect. An incorrect connectivity can also affect the operation of a CHIV instruction (which requires the specified atom to be bonded to three and only three non-hydrogen atoms) and the automatic generation of hydrogen atoms (HFIX). Superfluous bonds may be removed from the connectivity array using e.g.<br>
<b>FREE CB_23 CD_23</b><br>
Usually if the connectivity array (included in the .lst file except for MORE 0) is correct, the restraints will ensure that a sensible geometry is obtained during the refinement.




* go to http://shelx.uni-ac.gwdg.de/SHELX and read "SHELX-97 Manual as PDF", "Mini-protein refinement tutorial" as well as "P1-Lysozyme refinement tutorial", "Thomas Schneider's FAQs" and "FAQs: Macromolecules"
* go to http://shelx.uni-ac.gwdg.de/SHELX and read "SHELX-97 Manual as PDF", "Mini-protein refinement tutorial" as well as "P1-Lysozyme refinement tutorial", "Thomas Schneider's FAQs" and "FAQs: Macromolecules"
* run the option "I" in shelxpro to obtain .ins file from .pdb file; a ligand etc. may require the "J" option or http://davapc1.bioch.dundee.ac.uk/programs/prodrg/ to get restraints in SHELX format
* use "CGLS x y" refinement until convergence; the last run should be "CGLS x" only.
* a final job to get standard uncertainties (s.u., formerly e.s.d.) on all geometric parameters (see Q21 in "FAQs: Macromolecules"):
** change CGLS x y to REM CGLS x y
** insert lines L.S. 1, DAMP 0 0 and BLOC 1 (or e.g. BLOC N_1 > LAST )
** remove all restraints: lines begining with SIMU, DELU, ISOR, BUMP, DFIX, DANG, CHIV, FLAT and NCSY (from "Mini-protein refinement tutorial"). This is only useful for high-resolution work (let's say 1.4 A). Alternatively, one can determine standard uncertainties in the Bayesian sense that take all available knowledge into account by retaining all the restraints. This may be done at more modest resolution (say 2.5A or better). To obtain mean values and s.u. of e.g. distances or chiral volumes that occur several times in a structure, use DFIX or CHIV with "free variables". BOND, RTAB, HTAB and MPLA instructions may be needed to define the dependent parameters for which esds are required (from "FAQs: Macromolecules"). As an example, BIND FE_5001 NE2_123 together with BOND FE_5001 NE2_123 would enter the distance between FE_5001 and NE2_123 into the connectivity table, and would print out the distance and its s.u. into the .lst file.
49

edits