Xdsconv
XDSCONV is the conversion program of the XDS suite.
Possible output formats are SHELX, CNS, CCP4 (for F,SigF,DF,SigDF,isym), CCP4_F (for F,SigF,F(+),SigF(+),F(-),SigF(-)), CCP4_I (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-)) and CCP4_I+F (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-),FP,SIGFP,F(+),SIGF(+),F(-),SIGF(-)) - the "+" and "-" varieties are only output if FRIEDEL'S_LAW=FALSE.
XDSCONV does outlier rejection in some modes.
Typical use
A typical input file XDSCONV.INP might look like
INPUT_FILE=XDS_ASCII.HKL INCLUDE_RESOLUTION_RANGE=50 1 ! optional OUTPUT_FILE=temp.hkl CCP4 ! Warning: do _not_ name this file "temp.mtz" ! FRIEDEL'S_LAW=FALSE ! default is FRIEDEL'S_LAW=TRUE
This produces the file temp.hkl which is then converted to a MTZ file XDS_ASCII.mtz with (these lines are also printed out by XDSCONV):
f2mtz HKLOUT temp.mtz<F2MTZ.INP cad HKLIN1 temp.mtz HKLOUT XDS_ASCII.mtz<<EOF LABIN FILE 1 ALL END EOF
This latter step is not necessary for CNS and SHELX output formats, which are written directly by XDSCONV. For the CNS output format, one could use MERGE=FALSE to keep observations separate. For the SHELX output format, MERGE=FALSE is the default (I guess because George Sheldrick suggests that his programs, in particular XPREP, should be fed unmerged data. However I sometimes found that I obtain better SHELXD results with merging inside XDSCONV, using MERGE=TRUE).
N.B. It is good practice to always use FRIEDEL'S_LAW=FALSE - see Tips and Tricks.
how to change column labels
To have control over the column labels, one might want to modify the simple example above as:
f2mtz HKLOUT temp.mtz<F2MTZ.INP cad HKLIN1 temp.mtz HKLOUT junk_xdsconv.mtz<<EOF LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO E5=ISYM LABOUT FILE 1 E1=FP E2=SIGFP E3=DANO_sulf E4=SIGDANO_sulf E5=ISYM_sulf END EOF
ISYM column is important if you want to run SHARP afterwards.
In the case of a MTZ file that should be used for molecular replacement and refinement, the CAD step could be used to transfer the R_free flag from a different dataset to this new dataset. Alternatively, change of labels and transfer of columns can be done in the ccp4i GUI.
explanation of typical output
========== CONTROL CARDS ========== INPUT_FILE=XDS_ASCII.HKL OUTPUT_FILE=temp.hkl CCP4 SPACE_GROUP_NUMBER= 199 UNIT_CELL_CONSTANTS= 78.09 78.09 78.09 90.000 90.000 90.000 FRIEDEL'S_LAW=FALSE MERGE=TRUE NUMBER OF REFLECTION RECORDS ON INPUT FILE 217611 ! observations ("spots") NUMBER OF IGNORED REFLECTIONS (I< -3.0*SIGMA) 0 ! merged (unique) reflections, Friedels counted separately NUMBER OF REFLECTIONS ACCEPTED FROM INPUT FILE 23155 ! merged (unique) reflections, Friedels counted separately NUMBER OF UNIQUE REFLECTIONS ASSIGNED TO TEST SET 0 NUMBER OF UNIQUE TEST REFLECTIONS INHERITED 0 NUMBER OF UNIQUE TEST REFLECTIONS NEWLY GENERATED 0 NUMBER OF REFLECTION RECORDS ON OUTPUT FILE 12264 ! merged (unique) reflections; a Friedel pair is counted as one reflection for the MTZ file NUMBER OF RECORDS ASSIGNED TO WORKING SET 12264 ! but since each unique reflection is stored with its anomalous signal no information is lost NUMBER OF RECORDS ASSIGNED TO TEST SET 0
Obviously, the meaning of the word "reflection" differs between the output lines; some explanation is given after the exclamation mark.
how to obtain a MTZ file with DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-)
You have to run XDSCONV twice, and combine the output with cad. At the latter step you can also change the column labels:
#!/bin/csh -f # produce xds_allFinfo.mtz with FP SIGFP DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-) # in the same way, the labels produced with CCP4_I could be included! # # first xdsconv run producing FP SIGFP DANO SIGDANO echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP echo "OUTPUT_FILE= temp.hkl CCP4" >> XDSCONV.INP echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP xdsconv f2mtz HKLOUT temp1.mtz<F2MTZ.INP # second xdsconv run producing F(+) SIGF(+) F(-) SIGF(-) echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP echo "OUTPUT_FILE= temp.hkl CCP4_F" >> XDSCONV.INP echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP xdsconv f2mtz HKLOUT temp2.mtz<F2MTZ.INP # for CAD, the 2 LABOUT cards are only required if the labels should be changed cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT xds_allFinfo.mtz<<EOF LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-) LABOUT FILE 1 E1=FP_Hg E2=SIGFP_Hg E3=DANO_Hg E4=SIGDANO_Hg LABOUT FILE 2 E1=F(+)_Hg E2=SIGF(+)_Hg E3=F(-)_Hg E4=SIGF(-)_Hg END EOF
The following script does the same for the input file (first parameter to the script), but also adds a SUFFIX (second parameter) to the columns to better identify the data, and optionally copies the Rfree-flag from a reference mtz-file (third parameter). If the Rfree-flag is NOT named "FreeR_flag" (the default from ccp4i), you can provide its name as fourth parameter. All steps are logged into log-files, temporary files are deleted. The input file should end with .HKL (rather than e.g. .hkl). The script also sets the resolution to that of the observed data using sftools. Otherwise the resolution of the reference data set might be shown if that is higher. You can call this script 'xds2mtz.sh'. If it is executed without arguments, you get a short usage instruction.
#!/bin/bash function usage { echo "Usage: xds2mtz file.HKL SUFFIX [Rfree.mtz [RfreeFlag]]" echo "" echo " file.HKL: Output from XDS or XSCALE" echo " SUFFIX: Columns suffix, e.g. FP_SUFFIX" echo " Rfree.mtz: Reference mtz-file for Rfree transfer" echo " RfreeFlag: Label for Rfree set, defaults to \"FreeR_flag\"" echo "" } if [ -z $1 ]; then echo "*** Error: Missing input XDS file name" usage exit -1; fi if [ ! -f $1 ]; then echo "*** Error: File $1 does not exist" usage exit -1; fi BASE=$(basename $1) SUFFIX=$2 RFREE=$3 FLAG=$4 echo "Base = $BASE, Suffix = $SUFFIX" echo "INPUT_FILE= $1" > XDSCONV.INP echo "OUTPUT_FILE= temp1.hkl CCP4" >> XDSCONV.INP xdsconv && f2mtz HKLOUT temp1.mtz <F2MTZ.INP | tee ${BASE%.HKL}_dano.log echo "INPUT_FILE= $1" > XDSCONV.INP echo "OUTPUT_FILE= temp2.hkl CCP4_F" >> XDSCONV.INP xdsconv && f2mtz HKLOUT temp2.mtz <F2MTZ.INP |tee ${BASE%.HKL}_pm.log if [ -z $3 ]; then echo "Proceeding without Rfree reference file" cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-) LABOUT FILE 1 E1=FP_$SUFFIX E2=SIGFP_$SUFFIX E3=DANO_$SUFFIX E4=SIGDANO_$SUFFIX LABOUT FILE 2 E1=F(+)_$SUFFIX E2=SIGF(+)_$SUFFIX E3=F(-)_$SUFFIX E4=SIGF(-)_$SUFFIX eof else echo "Copying Rfree from file $3" if [ -z $4 ]; then FREERFLAG="FreeR_flag" # ccp4 standard name else FREERFLAG=$4 fi echo "Extracting flagged indices from ${FREERFLAG}" cad HKLIN1 temp1.mtz \ HKLIN2 temp2.mtz \ HKLIN3 $3 \ HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-) LABIN FILE 3 E1=${FREERFLAG} LABOUT FILE 1 E1=FP_$SUFFIX E2=SIGFP_$SUFFIX E3=DANO_$SUFFIX E4=SIGDANO_$SUFFIX LABOUT FILE 2 E1=F(+)_$SUFFIX E2=SIGF(+)_$SUFFIX E3=F(-)_$SUFFIX E4=SIGF(-)_$SUFFIX LABOUT FILE 3 E1=${FREERFLAG} eof rm temp1.mtz # correct for FreeRflag (if new file has more reflections than reference file) freerflag hklin ${BASE%.HKL}.mtz hklout temp1.mtz << eof | tee ${BASE%.HKL}_freerflag.log COMPLETE FREE=${FREERFLAG} end eof # correct for real data in case Rfree data set contains too many hkls # thanks to Andrey Lebedev sftools << eof | tee ${BASE%.HKL}_sftools.log READ ${BASE%.HKL}.mtz SELECT ONLY COLUMN FP_$SUFFIX PRESENT WRITE temp1.mtz END eof mv temp1.mtz ${BASE%.HKL}.mtz fi rm -f XDSCONV.INP temp1.hkl temp1.mtz temp2.hkl temp2.mtz F2MTZ.INP XDSCONV.LP
Hint for long-time XDSCONV users
The latest versions of the program do not require
SPACE_GROUP_NUMBER= UNIT_CELL_PARAMETERS=
because these are picked up from the header of the input reflection file. However, if you want to change the parameters of either keyword then you have to specify both! I.e. if you want to change the spacegroup then you also have to specify the unit cell parameters.