2QVO.xds
XDS data reduction
dataset 2
This is a pared-down XDS.INP (obtained by egrep -v '^ *!' XDS.INP) based upon XDS-MARCDD.INP from the XDS distribution site - it has only those lines that are not commented out (to arrive here, one takes the steps outlined in Tutorial(First_Steps)):
DETECTOR=CCDCHESS MINIMUM_VALID_PIXEL_VALUE=1 OVERLOAD=65000 DIRECTION_OF_DETECTOR_X-AXIS= 1.0 0.0 0.0 DIRECTION_OF_DETECTOR_Y-AXIS= 0.0 1.0 0.0 TRUSTED_REGION=0.0 0.99 !Relative radii limiting trusted detector region MAXIMUM_NUMBER_OF_PROCESSORS=8!<25;ignored by single cpu version of xds JOB= XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT ORGX=2000 ORGY=2048 !Detector origin (pixels)! numbers are rough estimates w/ adxv DETECTOR_DISTANCE= 125.0 !(mm) ROTATION_AXIS= 1.0 0.0 0.0 OSCILLATION_RANGE=1.0 !degrees (>0) X-RAY_WAVELENGTH=1.9 !Angstroem INCIDENT_BEAM_DIRECTION=0.0 0.0 1.0 FRACTION_OF_POLARIZATION=0.95 !default=0.5 for unpolarized beam POLARIZATION_PLANE_NORMAL= 0.0 1.0 0.0 SPACE_GROUP_NUMBER=0 !0 for unknown crystals; cell constants are ignored. FRIEDEL'S_LAW=FALSE !Default is TRUE. NAME_TEMPLATE_OF_DATA_FRAMES=../../g/040707-8_2_2_1.???? ! TIFF DATA_RANGE=1 360 !Numbers of first and last data image collected BACKGROUND_RANGE=1 5 !Numbers of first and last data image for background SPOT_RANGE=1 180 !First and last data image number for finding spots REFINE(IDXREF)=BEAM AXIS ORIENTATION CELL DISTANCE REFINE(INTEGRATE)=DISTANCE BEAM ORIENTATION CELL !AXIS REFINE(CORRECT)=DISTANCE BEAM ORIENTATION CELL AXIS VALUE_RANGE_FOR_TRUSTED_DETECTOR_PIXELS= 6000 30000 !Used by DEFPIX for excluding shaded parts of the detector. INCLUDE_RESOLUTION_RANGE=50.0 0 !Angstroem; used by DEFPIX,INTEGRATE,CORRECT MINIMUM_ZETA=0.1 !Defines width of 'blind region' (XPLAN,INTEGRATE,CORRECT) WFAC1=1.5 !This controls the number of rejected MISFITS in CORRECT; a larger value leads to fewer rejections. STRONG_PIXEL=6.0 !used by: COLSPOT
Using the above as XDS.INP, we run xds_par for the first time. It will stop after the IDXREF step with the usual error message
!!! ERROR !!! INSUFFICIENT PERCENTAGE (< 70%) OF INDEXED REFLECTIONS AUTOMATIC DATA PROCESSING STOPPED. AS THE CRITERIA FOR A GOOD SOLUTION ARE RATHER STRICT, YOU MAY CHOOSE TO CONTINUE DATA PROCESSING AFTER CHANGING THE "JOB="-CARD IN "XDS.INP" TO "JOB= DEFPIX INTEGRATE CORRECT". IF THE BEST SOLUTION IS REALLY NONSENSE YOU SHOULD FIRST HAVE A LOOK AT THE ASCII-FILE "SPOT.XDS". THIS FILE CONTAINS THE INITIAL SPOT LIST SORTED IN DECREASING SPOT INTENSITY. SPOTS NEAR THE END OF THE FILE MAY BE ARTEFACTS AND SHOULD BE ERASED. ALTERNATIVELY YOU MAY TRY DIFFERENT VALUES FOR "INDEX_ORIGIN" AS SUGGESTED IN THE ABOVE LISTING. IF THE CRYSTAL HAS SLIPPED AT THE BEGINNING OF DATA COLLECTION YOU MAY CHOOSE TO SKIP SOME OF THE FIRST FRAMES BY CHANGING THE "DATA_RANGE=" IN FILE "XDS.INP" AND START ALL OVER AGAIN.
We choose to continue nevertheless and modify XDS.INP to have
JOB= DEFPIX INTEGRATE CORRECT
Again we run xds_par. This runs to completion. The automatic spacegroup determination comes up with
SPACE_GROUP_NUMBER= 75 UNIT_CELL_CONSTANTS= 53.10 53.10 40.90 90.000 90.000 90.000
Now we copy these two lines to XDS.INP, replacing the old line SPACE_GROUP_NUMBER=0 . Then we modify the spacegroup number to 77 because we know that the true spacegroup is P4_2. Also, we modify the JOB line once again:
JOB= CORRECT
and run xds_par for the last time.
The resulting output files are XYCORR.LP, INIT.LP, COLSPOT.LP, IDXREF.LP, DEFPIX.LP, INTEGRATE.LP and CORRECT.LP. Data files are XPARM.XDS (from IDXREF), and the XDS_ASCII.HKL file all of which can be downloaded from here (right-click with the mouse, and then save the file to your disk).
dataset 1
This works exactly the same way as dataset 2, except that we have to replace ../../g/040707-8_2_2_1.???? by f/040707-8_2_2_1.???? where f points to the directory with the frames. All .LP files, XPARM.XDS and XDS_ASCII.HKL are here (right-click).
SHELXC/D/E structure solution
This is done in a subdirectory of the XDS data reduction directory. Here, we generate XDSCONV.INP (I used MERGE=TRUE, sometimes the results are better that way) and run xdsconv and SHELXC:
#!/bin/csh -f cat > XDSCONV.INP <<end INPUT_FILE=../XDS_ASCII.HKL OUTPUT_FILE=temp.hkl SHELX MERGE=TRUE FRIEDEL'S_LAW=FALSE end xdsconv shelxc j <<end SAD temp.hkl CELL 53.10 53.10 40.90 90 90 90 SPAG P42 MAXM 2 end
This writes j.hkl, j_fa.hkl and j_fa.ins. However, we overwrite j_fa.ins now:
cat > j_fa.ins <<end TITL j_fa.ins SAD in P42 CELL 0.98000 53.10 53.10 40.90 90.00 90.00 90.00 LATT -1 SYMM -Y, X, 1/2+Z SYMM -X, -Y, Z SYMM Y, -X, 1/2+Z SFAC S UNIT 128 SHEL 999 3.0 FIND 3 NTRY 100 MIND -1.0 2.2 ESEL 1.3 TEST 0 99 SEED 1 PATS HKLF 3 END end shelxd j_fa
This gives best CC All/Weak of 35.61 / 26.03 for dataset 2, and best CC All/Weak of 36.74 / 21.55 for dataset 1.
Next we run G. Sheldrick's beta-Version of SHELXE Version 2009/4:
shelxe.beta j j_fa -a6 -q -h -s0.55 -m20 -b
Some important lines in the output: for dataset 2, I get
79 residues left after pruning, divided into chains as follows: A: 20 B: 22 C: 37 CC for partial structure against native data = 50.42 % ... <wt> = 0.300, Contrast = 0.731, Connect. = 0.817 for dens.mod. cycle 20 ... Estimated mean FOM = 0.659 Pseudo-free CC = 68.71 %
for dataset 1, I get
80 residues left after pruning, divided into chains as follows: A: 23 B: 57
CC for partial structure against native data = 45.79 % ... <wt> = 0.300, Contrast = 0.711, Connect. = 0.812 for dens.mod. cycle 20 ... Estimated mean FOM = 0.611 Pseudo-free CC = 63.70 %
clearly indicating that the structure can be solved with each of the two datasets individually.
For completeness, we run the inverse hand:
shelxe.beta j j_fa -a6 -q -h -s0.55 -m20 -b -i
but of course this gives much worse statistics.
Optimization of data reduction
The only safe way to optimize the data reduction is to look at external quality indicators. Internal R-factors, and even the correlation coefficient of the anomalous signal are of comparatively little value. A readily available external quality indicator is CCmax/CCweak as obtained by SHELXD.
WFAC1 was already discussed above. Another candidate for optimization is MAXIMUM_ERROR_OF_SPOT_POSITION. By default this is 3.0 . In the case of these data, this default appears to be too small, because the STANDARD DEVIATION OF SPOT POSITION (PIXELS) (as reported by IDXREF, INTEGRATE and CORRECT after refinement) is quite high (1.5 and more). This prevents XDS from using all the reflections for geometry refinement.
I found that MAXIMUM_ERROR_OF_SPOT_POSITION=6.0 significantly improved the internal statistics (mostly the r-factors, but not so much the correlation coefficient of the anom signal), and improved CCmax/CCweak indicators (to more than 40). SHELXE then produces significantly better and more complete models. Try for yourself!
There are some parameters in the SHELXC/D/E approach above that could be optimized as well: first of all, MERGE=TRUE in XDSCONV.INP turned later out to be the wrong choice (using the default MERGE=FALSE turns out to give a model with 85 consecutive residues for dataset 1). Then of course, the resolution limit for SHELXD could be varied, and the solvent content for SHELXE. For SHELXE in particular, many things could be tried.
Limits
With dataset 2, I tried to use 270 frames but could not solve the structure using the above SHELXC/D/E approach (not even with MAXIMUM_ERROR_OF_SPOT_POSITION=6.0). With 315 frames, it was possible.