Xdsconv
XDSCONV is the conversion program of the XDS suite.
Possible output formats are SHELX, CNS, CCP4 (for F,SigF,DF,SigDF,isym), CCP4_F (for F,SigF,F(+),SigF(+),F(-),SigF(-)), CCP4_I (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-)) and CCP4_I+F (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-),FP,SIGFP,F(+),SIGF(+),F(-),SIGF(-)) - the "+" and "-" varieties are only output if FRIEDEL'S_LAW=FALSE.
XDSCONV does outlier rejection in some modes.
Typical use
A typical input file XDSCONV.INP might look like
INPUT_FILE=XDS_ASCII.HKL INCLUDE_RESOLUTION_RANGE=50 1 ! optional OUTPUT_FILE=temp.hkl CCP4 ! Warning: do _not_ name this file "temp.mtz" ! FRIEDEL'S_LAW=FALSE ! default is FRIEDEL'S_LAW=TRUE
This produces the file temp.hkl which is then converted to a MTZ file XDS_ASCII.mtz with (these lines are also printed out by XDSCONV):
f2mtz HKLOUT temp.mtz<F2MTZ.INP cad HKLIN1 temp.mtz HKLOUT XDS_ASCII.mtz<<EOF LABIN FILE 1 ALL END EOF
This latter step is not necessary for CNS and SHELX output formats, which are written directly by XDSCONV. For the CNS output format, one could use MERGE=FALSE to keep observations separate. For the SHELX output format, MERGE=FALSE is the default (I guess because George Sheldrick suggests that his programs, in particular XPREP, should be fed unmerged data. However I sometimes found that I obtain better SHELXD results with merging inside XDSCONV, using MERGE=TRUE).
N.B. It is good practice to always use FRIEDEL'S_LAW=FALSE - see Tips and Tricks.
how to change column labels
To have control over the column labels, one might want to modify the simple example above as:
f2mtz HKLOUT temp.mtz<F2MTZ.INP cad HKLIN1 temp.mtz HKLOUT junk_xdsconv.mtz<<EOF LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO E5=ISYM LABOUT FILE 1 E1=FP E2=SIGFP E3=DANO_sulf E4=SIGDANO_sulf E5=ISYM_sulf END EOF
ISYM column is important if you want to run SHARP afterwards.
In the case of a MTZ file that should be used for molecular replacement and refinement, the CAD step could be used to transfer the R_free flag from a different dataset to this new dataset. Alternatively, change of labels and transfer of columns can be done in the ccp4i GUI.
explanation of typical output
========== CONTROL CARDS ==========
INPUT_FILE=XDS_ASCII.HKL
OUTPUT_FILE=temp.hkl CCP4
SPACE_GROUP_NUMBER= 199
UNIT_CELL_CONSTANTS= 78.09 78.09 78.09 90.000 90.000 90.000
FRIEDEL'S_LAW=FALSE
MERGE=TRUE
NUMBER OF REFLECTION RECORDS ON INPUT FILE 217611 ! observations ("spots")
NUMBER OF IGNORED REFLECTIONS (I< -3.0*SIGMA) 0 ! merged (unique) reflections, Friedels counted separately
NUMBER OF REFLECTIONS ACCEPTED FROM INPUT FILE 23155 ! merged (unique) reflections, Friedels counted separately
NUMBER OF UNIQUE REFLECTIONS ASSIGNED TO TEST SET 0
NUMBER OF UNIQUE TEST REFLECTIONS INHERITED 0
NUMBER OF UNIQUE TEST REFLECTIONS NEWLY GENERATED 0
NUMBER OF REFLECTION RECORDS ON OUTPUT FILE 12264 ! merged (unique) reflections; a Friedel pair is counted as one reflection for the MTZ file
NUMBER OF RECORDS ASSIGNED TO WORKING SET 12264 ! but since each unique reflection is stored with its anomalous signal no information is lost
NUMBER OF RECORDS ASSIGNED TO TEST SET 0
Obviously, the meaning of the word "reflection" differs between the output lines; some explanation is given after the exclamation mark.
how to obtain a MTZ file with DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-)
You have to run XDSCONV twice, and combine the output with cad. At the latter step you can also change the column labels:
#!/bin/csh -f # produce xds_allFinfo.mtz with FP SIGFP DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-) # in the same way, the labels produced with CCP4_I could be included! # # first xdsconv run producing FP SIGFP DANO SIGDANO echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP echo "OUTPUT_FILE= temp.hkl CCP4" >> XDSCONV.INP echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP xdsconv f2mtz HKLOUT temp1.mtz<F2MTZ.INP # second xdsconv run producing F(+) SIGF(+) F(-) SIGF(-) echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP echo "OUTPUT_FILE= temp.hkl CCP4_F" >> XDSCONV.INP echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP xdsconv f2mtz HKLOUT temp2.mtz<F2MTZ.INP # for CAD, the 2 LABOUT cards are only required if the labels should be changed cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT xds_allFinfo.mtz<<EOF LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-) LABOUT FILE 1 E1=FP_Hg E2=SIGFP_Hg E3=DANO_Hg E4=SIGDANO_Hg LABOUT FILE 2 E1=F(+)_Hg E2=SIGF(+)_Hg E3=F(-)_Hg E4=SIGF(-)_Hg END EOF
The following script does the same for the input file (first parameter to the script), but also adds a SUFFIX (second parameter) to the columns to better identify the data, and optionally copies the Rfree-flag from a reference mtz-file (third parameter). If the Rfree-flag is NOT named "FreeR_flag" (the default from ccp4i), you can provide its name as fourth parameter. All steps are logged into log-files, temporary files are deleted. The input file should end with .HKL (rather than e.g. .hkl). The script also sets the resolution to that of the observed data using sftools. Otherwise the resolution of the reference data set might be shown if that is higher. You can call this script 'xds2mtz.sh'. If it is executed without arguments, you get a short usage instruction.
#!/bin/bash
function usage {
echo "Usage: xds2mtz file.HKL SUFFIX [Rfree.mtz [RfreeFlag]]"
echo ""
echo " file.HKL: Output from XDS or XSCALE"
echo " SUFFIX: Columns suffix, e.g. FP_SUFFIX"
echo " Rfree.mtz: Reference mtz-file for Rfree transfer"
echo " RfreeFlag: Label for Rfree set, defaults to \"FreeR_flag\""
echo ""
}
if [ -z $1 ]; then
echo "*** Error: Missing input XDS file name"
usage
exit -1;
fi
if [ ! -f $1 ]; then
echo "*** Error: File $1 does not exist"
usage
exit -1;
fi
BASE=$(basename $1)
SUFFIX=$2
RFREE=$3
FLAG=$4
echo "Base = $BASE, Suffix = $SUFFIX"
echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp1.hkl CCP4" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp1.mtz <F2MTZ.INP | tee ${BASE%.HKL}_dano.log
echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp2.hkl CCP4_F" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp2.mtz <F2MTZ.INP |tee ${BASE%.HKL}_pm.log
if [ -z $3 ]; then
echo "Proceeding without Rfree reference file"
cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO
LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-)
LABOUT FILE 1 E1=FP_$SUFFIX E2=SIGFP_$SUFFIX E3=DANO_$SUFFIX E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX E2=SIGF(+)_$SUFFIX E3=F(-)_$SUFFIX E4=SIGF(-)_$SUFFIX
eof
else
echo "Copying Rfree from file $3"
if [ -z $4 ]; then
FREERFLAG="FreeR_flag" # ccp4 standard name
else
FREERFLAG=$4
fi
echo "Extracting flagged indices from ${FREERFLAG}"
cad HKLIN1 temp1.mtz \
HKLIN2 temp2.mtz \
HKLIN3 $3 \
HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO
LABIN FILE 2 E1=F(+) E2=SIGF(+) E3=F(-) E4=SIGF(-)
LABIN FILE 3 E1=${FREERFLAG}
LABOUT FILE 1 E1=FP_$SUFFIX E2=SIGFP_$SUFFIX E3=DANO_$SUFFIX E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX E2=SIGF(+)_$SUFFIX E3=F(-)_$SUFFIX E4=SIGF(-)_$SUFFIX
LABOUT FILE 3 E1=${FREERFLAG}
eof
rm temp1.mtz
# correct for FreeRflag (if new file has more reflections than reference file)
freerflag hklin ${BASE%.HKL}.mtz hklout temp1.mtz << eof | tee ${BASE%.HKL}_freerflag.log
COMPLETE FREE=${FREERFLAG}
end
eof
# correct for real data in case Rfree data set contains too many hkls
# thanks to Andrey Lebedev
sftools << eof | tee ${BASE%.HKL}_sftools.log
READ ${BASE%.HKL}.mtz
SELECT ONLY COLUMN FP_$SUFFIX PRESENT
WRITE temp1.mtz
END
eof
mv temp1.mtz ${BASE%.HKL}.mtz
fi
rm -f XDSCONV.INP temp1.hkl temp1.mtz temp2.hkl temp2.mtz F2MTZ.INP XDSCONV.LP
Hint for long-time XDSCONV users
The latest versions of the program do not require
SPACE_GROUP_NUMBER= UNIT_CELL_PARAMETERS=
because these are picked up from the header of the input reflection file. However, if you want to change the parameters of either keyword then you have to specify both! I.e. if you want to change the spacegroup then you also have to specify the unit cell parameters.