Refinement

Theory

See ccp4dev:Refinement. For an explanation of terms, see http://www.usm.maine.edu/~rhodes/ModQual/index.html#RefineXray

Programs

restraints for ligands

All refinement programs come with a set of ligands known to them, i.e. the files describing the topology and parameters of these ligands are part of the distribution. Both Refmac and phenix.refine use one large file called mon_lib_list.cif . CNS uses files in the $CNS_TOPPAR directory.

If you have a ligand that is unknown to the refinement program, you could either

identify a similar ligand among the known ones and modify it
use the PRODRG server to obtain the ligand description
use G. Kleywegt's HIC-Up to obtain the ligand description
try to identify the ligand in the list of chemical compounds occurring in PDB files, at http://www.wwpdb.org/ccd.html - maybe it is known under a different name than you thought, and you just have to adjust your PDB file

what can go wrong in refinement?

R-factor does not go down

If this happens in the R-factor range of 30-40, here are a couple of possible reasons:

wrong spacegroup (usually the true symmetry is lower; see below)
twinning (this happens more often than you'd like, see A. A. Lebedev, A. A. Vagin and G. N. Murshudov (2006) Intensity statistics in twinned crystals with examples from the PDB. Acta Cryst. D62, 83-95)
bad data - check the statistics of the data reduction program
model incomplete or wrong: remove suspicious parts (or just give them an occupancy of 0), refine everything else, and check whether these parts re-appear in the map.
other refinement options should be exploited, e.g. try TLS refinement

help, my protein has high B-factors!

This is also a FAQ on CCP4BB. The answer is: there's probably nothing wrong with it. If your crystals diffract to 3 A at a synchrotron, then the average B-factor should most likely be on the order of 100 A^2. If your crystals diffract to 2 A, then the average B-factor is most likely on the order of 40 A^2 or so. Use B. Rupp's calculator ([1]) to find out the dependance of scattering power on B-factor.

R_free much higher than R

how large should the difference between R_free and R be?

see R-factors#Relation_between_R_and_Rfree_as_a_function_of_resolution

Wrong space group

Sometimes crystal symmetry combines with non-crystallographic symmetry (NCS) and produces a diffraction pattern resembling higher symmetry space group than what you really have. NCS in this case closely resembles crystallographic symmetry. If resolution is not high enough, the difference in spot positions may be too small to give any detectable problems with indexing, integration and scaling. Even phasing (e.g. molecular replacement) may be successful. But if your R-factor hangs fairly high and you have problems building parts of your structure, it is worth trying to check other space groups. The most straightforward approach is to try processing data in P1, because if that does not bring R-factor down significantly, other space group choices will not solve the problem either.

This occurs most often at moderate resolution. However, the structure of the ketosteroid isomerase had to be refined in P1 at atomic resolution, although it refines well in C2221 at lower resolution such as 1.5A.

Refining low resolution structures

Maintaining the secondary structure of your model when refining against weak data can be really challenging. When building manually, you may end up with a fairly large number of Ramachandran plot outliers.

Try phenix.refine with the keyword "discard_psi_phi=False". Then the psi and phi dihedral angles should be restrained according to the CCP4 monomer library definitions. There was a discussion of it in the phenixbb in July 2007. Also see the discussion in the ccp4bb from December 2006.

Remember that phi-psi angles are excellent for validation purposes but only when they are unrestrained. If you restrain them, you lose this option!

You can also try restraining alpha-helices hydrogen bonding, and beta-sheet cross-strand hydrogen bonds. This can be done in REFMAC (using ProSMART) and phenix.refine (using a reference model).

If you are really desperate, another option could be to use harmonic restraints in CNS to keep your backbone fairly fixed in parts of the map where you believe the secondary structure is correct (most likely alpha-helices). You could also fix main-chain elements completely (in any refinement program), but it is definitely preferable to leave some room for change in the xyz positions, and harmonic restraints are a nice way of doing exactly that.

Bulk solvent correction produces difference density

Sometimes people observe strong residual difference density in a cavity of the protein. E.g. there was a paper by Brian Matthews' group (Marcus D. Collins, Michael L. Quillin, Gerhard Hummer, Brian W. Matthews, Sol M. Gruner, Structural Rigidity of a Large Cavity-containing Protein Revealed by High-pressure Crystallography, Journal of Molecular Biology, Volume 367, Issue 3, 30 March 2007, Pages 752-763, [2]) on a high pressure form of lysozyme where they found a large hydrophobic void. Bulk water could only be compelled to enter the void by application of very high external pressure.

Bulk solvent mask artifacts can only occur at narrow channels, where the mask radius is too big to define the channel as belonging to the bulk solvent region, leaving it "empty" and thus resulting in negative difference density.

The following advice is specific for Refmac: Changing from simple scaling to Babinet scaling is an important check to exclude mask bulk solvent artifacts, but there, you have to uncheck the "calculate contribution from the solvent region", because this is done by the Babinet scaling, already. Alternatively, you can optimise the solvent mask parameters by running Refmac with the keyword "solvent optimise". This will write out R and R-free for different combinations of VDW probe, ion probe, and shrinkage sizes. For subsequent Refmac runs you can use the keywords "solvent vdwprobe $VDWPROBE ionprobe $IONPROBE rshrink $RSHRINK" replacing $VDWPROBE, $IONPROBE, and $RSHRINK with the optimal values from the previous optimisation or you can set these values in the GUI. If the peaks remain, try gradually reducing the size of the VDW probe.

For phenix.refine, the bulk solvent mask may be optimized using "phenix.refine data.hkl model.pdb optimize_mask=true" - see [3].

In the case of negative difference density in a big hydrophobic cavity, one possible reason for a negative difference density are underestimated magnitudes of |Fobs| at very low resolution, either because they are weakened by the beam-stop (half-)shadow, or because they are overloads that have been poorly extrapolated. A simple check for wrongly determined low-resolution |Fobs| is to cut your low resolution data during refinement at a somewhat higher resolution, say 20 A instead of 80 A, and see whether the negative difference density disappears. If, yes, you should check your data processing again.

The other possibility of course is that the data is good, that this is an accurate experimental result and there really is a void, or at least a cavity where the mean bulk density is lower than in bulk water. One way to test the void theory would be to fill the cavity with O atoms of zero (or very small, say 0.01) occupancy. Hopefully (!) that will prevent Refmac filling the cavity with bulk solvent. One could then try giving these O atoms large B factors, say 200, to smear them out, and then increase the occupancies to titrate the actual bulk density.

Since 2016, so-called Polder maps in Phenix allow to calculate omit density without filling in water which may obscure a ligand.

Model correctly placed, but difference density remains after refinement

Fourier truncation ripples:
- CCP4 Newsletter "On the Fourier series truncation peaks at subatomic resolution" by Anne Bochow, Alexandre Urzhumtsev
- Pages 52-55 here: [4]
- Oliver Einsle, et al. Science, 1696 (2002) 297
- Page 267 Figure 4 in Acta Cryst. (2004). D60, 260-274

Try:
- refine individual anisotropic ADP for these atoms (and isotropic for the rest);
- refine occupancy;
- define charge in input PDB file;
- if it is anomalous scatterer use and refine f' and f.