Xdsconv: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
mNo edit summary
m (fix links)
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xdsconv_parameters.html XDSCONV] is the conversion program of the [[XDS]] suite.
[http://xds.mpimf-heidelberg.mpg.de/html_doc/xdsconv_parameters.html XDSCONV] is the conversion program of the [[XDS]] suite.
 
Possible output formats are SHELX, CNS, CCP4 (for F SIGF DANO SIGDANO), CCP4_F (for F+ F- SIGF+ SIGF-) and CCP4_I (for I+ I- SIGI+ SIGI-).


Possible output formats are SHELX, CNS, CCP4 (for F,SigF,DF,SigDF,isym), CCP4_F (for F,SigF,F(+),SigF(+),F(-),SigF(-)), CCP4_I (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-)) and CCP4_I+F (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-),FP,SIGFP,F(+),SIGF(+),F(-),SIGF(-)) - the "+" and "-" varieties are only output if FRIEDEL'S_LAW=FALSE.
----
XDSCONV does outlier rejection in some modes.
== Typical use ==
A typical input file XDSCONV.INP might look like
A typical input file XDSCONV.INP might look like
  INPUT_FILE=XDS_ASCII.HKL
  INPUT_FILE=XDS_ASCII.HKL
  INCLUDE_RESOLUTION_RANGE=50 1  ! optional  
  INCLUDE_RESOLUTION_RANGE=50 1  ! optional  
  OUTPUT_FILE=temp.hkl  CCP4  
  OUTPUT_FILE=temp.hkl  CCP4     ! Warning: do _not_ name this file "temp.mtz" !
  FRIEDEL'S_LAW=FALSE            ! default is FRIEDEL'S_LAW=TRUE
  FRIEDEL'S_LAW=FALSE            ! default is FRIEDEL'S_LAW=TRUE
This produces the file temp.hkl which is then converted to a MTZ file XDS_ASCII.mtz with (these lines are also printed out by XDSCONV):
This produces the file temp.hkl which is then converted to a MTZ file XDS_ASCII.mtz with (these lines are also printed out by XDSCONV):
Line 14: Line 16:
  END
  END
  EOF
  EOF
This latter step is not necessary for CNS and SHELX output formats, which are written directly by XDSCONV. For these output formats, one might use MERGE=FALSE to keep reflections separate.
This latter step is not necessary for CNS and SHELX output formats, which are written directly by XDSCONV. For the CNS output format, one could use MERGE=FALSE to keep observations separate. For the SHELX output format, MERGE=FALSE is the default (I guess because George Sheldrick suggests that his programs, in particular [http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/XPREP XPREP], should be fed unmerged data. However I sometimes found that I obtain better SHELXD results with merging inside XDSCONV, using MERGE=TRUE).
 
N.B. It is good practice to always use FRIEDEL'S_LAW=FALSE - see [[Tips and Tricks]].
 
=== how to change column labels ===
To have control over the column labels, one might want to modify the simple example above as:
 
f2mtz HKLOUT temp.mtz<F2MTZ.INP
cad HKLIN1 temp.mtz HKLOUT junk_xdsconv.mtz<<EOF
LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO E5=ISYM
LABOUT FILE 1 E1=FP E2=SIGFP E3=DANO_sulf E4=SIGDANO_sulf E5=ISYM_sulf
END
EOF
 
ISYM column is important if you want to run SHARP afterwards.
 
In the case of a MTZ file that should be used for molecular replacement and refinement, the CAD step could be used to transfer the R_free flag from a different dataset to this new dataset. Alternatively, change of labels and transfer of columns can be done in the ccp4i GUI.


----
== explanation of typical output ==
<pre>
========== CONTROL CARDS ==========
 
INPUT_FILE=XDS_ASCII.HKL
OUTPUT_FILE=temp.hkl CCP4
 
 
SPACE_GROUP_NUMBER=  199
UNIT_CELL_CONSTANTS=    78.09    78.09    78.09  90.000  90.000  90.000
FRIEDEL'S_LAW=FALSE
MERGE=TRUE
NUMBER OF REFLECTION RECORDS ON INPUT FILE      217611      ! observations ("spots")
NUMBER OF IGNORED REFLECTIONS (I< -3.0*SIGMA)        0      ! merged (unique) reflections, Friedels counted separately
NUMBER OF REFLECTIONS ACCEPTED FROM INPUT FILE  23155      ! merged (unique) reflections, Friedels counted separately
 
NUMBER OF UNIQUE REFLECTIONS ASSIGNED TO TEST SET        0
NUMBER OF UNIQUE TEST REFLECTIONS INHERITED              0
NUMBER OF UNIQUE TEST REFLECTIONS NEWLY GENERATED        0
 
NUMBER OF REFLECTION RECORDS ON OUTPUT FILE      12264      ! merged (unique) reflections; a Friedel pair is counted as one reflection for the MTZ file
NUMBER OF RECORDS ASSIGNED TO WORKING SET        12264      ! but since each unique reflection is stored with its anomalous signal no information is lost
NUMBER OF RECORDS ASSIGNED TO TEST SET              0
</pre>
 
'''Obviously, the meaning of the word "reflection" differs between the output lines; some explanation is given after the exclamation mark.
'''
 
== how to obtain a MTZ file with DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-) ==
You have to run XDSCONV twice, and combine the output with cad. At the latter step you can also change the column labels:
#!/bin/csh -f
# produce xds_allFinfo.mtz with FP SIGFP DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-)
# in the same way, the labels produced with CCP4_I could be included!
#
# first xdsconv run producing FP SIGFP DANO SIGDANO
echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP
echo "OUTPUT_FILE= temp.hkl CCP4" >> XDSCONV.INP
echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP
xdsconv
f2mtz HKLOUT temp1.mtz<F2MTZ.INP
# second xdsconv run producing F(+) SIGF(+) F(-) SIGF(-)
echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP
echo "OUTPUT_FILE= temp.hkl CCP4_F" >> XDSCONV.INP
echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP
xdsconv
f2mtz HKLOUT temp2.mtz<F2MTZ.INP
# for CAD, the 2 LABOUT cards are only required if the labels should be changed
cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT xds_allFinfo.mtz<<EOF
  LABIN  FILE 1 E1=FP      E2=SIGFP      E3=DANO    E4=SIGDANO
  LABIN  FILE 2 E1=F(+)    E2=SIGF(+)    E3=F(-)    E4=SIGF(-)
  LABOUT FILE 1 E1=FP_Hg    E2=SIGFP_Hg    E3=DANO_Hg  E4=SIGDANO_Hg
  LABOUT FILE 2 E1=F(+)_Hg  E2=SIGF(+)_Hg  E3=F(-)_Hg  E4=SIGF(-)_Hg
  END
EOF
 
The following script does the same for the input file (first parameter to the script), but also adds a SUFFIX (second parameter) to the columns to better identify the data, and optionally copies the Rfree-flag from a reference mtz-file (third parameter). If the Rfree-flag is NOT named "FreeR_flag" (the default from ccp4i), you can provide its name as fourth parameter. All steps are logged into log-files, temporary files are deleted. The input file should end with .HKL (rather than e.g. .hkl).
The script also sets the resolution to that of the observed data using sftools. Otherwise the resolution of the reference data set might be shown if that is higher. You can call this script 'xds2mtz.sh'. If it is executed without arguments, you get a short usage instruction.
<nowiki>#!/bin/bash
 
function usage {
echo "Usage: xds2mtz file.HKL SUFFIX [Rfree.mtz [RfreeFlag]]"
echo ""
echo "      file.HKL:  Output from XDS or XSCALE"
echo "      SUFFIX:    Columns suffix, e.g. FP_SUFFIX"
echo "      Rfree.mtz: Reference mtz-file for Rfree transfer"
echo "      RfreeFlag: Label for Rfree set, defaults to \"FreeR_flag\""
echo ""
}
 
if [ -z $1 ]; then
echo "*** Error: Missing input XDS file name"
usage
exit -1;
fi
if [ ! -f $1 ]; then
echo "*** Error: File $1 does not exist"
usage
exit -1;
fi
 
 
BASE=$(basename $1)
SUFFIX=$2
RFREE=$3
FLAG=$4
 
echo "Base = $BASE, Suffix = $SUFFIX"
 
 
echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp1.hkl CCP4" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp1.mtz <F2MTZ.INP | tee ${BASE%.HKL}_dano.log
 
echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp2.hkl CCP4_F" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp2.mtz <F2MTZ.INP |tee ${BASE%.HKL}_pm.log
 
if [ -z $3 ]; then
echo "Proceeding without Rfree reference file"
cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN  FILE 1 E1=FP      E2=SIGFP      E3=DANO    E4=SIGDANO
LABIN  FILE 2 E1=F(+)    E2=SIGF(+)    E3=F(-)    E4=SIGF(-)
LABOUT FILE 1 E1=FP_$SUFFIX    E2=SIGFP_$SUFFIX    E3=DANO_$SUFFIX  E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX  E2=SIGF(+)_$SUFFIX  E3=F(-)_$SUFFIX  E4=SIGF(-)_$SUFFIX
eof
else
echo "Copying Rfree from file $3"
if [ -z $4 ]; then
FREERFLAG="FreeR_flag" # ccp4 standard name
else
FREERFLAG=$4
fi
echo "Extracting flagged indices from ${FREERFLAG}"
cad HKLIN1 temp1.mtz \
HKLIN2 temp2.mtz \
HKLIN3 $3 \
HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN  FILE 1 E1=FP      E2=SIGFP      E3=DANO    E4=SIGDANO
LABIN  FILE 2 E1=F(+)    E2=SIGF(+)    E3=F(-)    E4=SIGF(-)
LABIN  FILE 3 E1=${FREERFLAG}
LABOUT FILE 1 E1=FP_$SUFFIX    E2=SIGFP_$SUFFIX    E3=DANO_$SUFFIX  E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX  E2=SIGF(+)_$SUFFIX  E3=F(-)_$SUFFIX  E4=SIGF(-)_$SUFFIX
LABOUT FILE 3 E1=${FREERFLAG}
eof
 
rm temp1.mtz
 
# correct for FreeRflag (if new file has more reflections than reference file)
freerflag hklin ${BASE%.HKL}.mtz hklout temp1.mtz << eof | tee ${BASE%.HKL}_freerflag.log
COMPLETE FREE=${FREERFLAG}
end
eof
 
# correct for real data in case Rfree data set contains too many hkls
# thanks to Andrey Lebedev
sftools << eof | tee ${BASE%.HKL}_sftools.log
READ ${BASE%.HKL}.mtz
SELECT ONLY COLUMN FP_$SUFFIX PRESENT
WRITE temp1.mtz
END
eof
 
mv temp1.mtz ${BASE%.HKL}.mtz


XDSCONV does outlier rejection in some modes (FIXME: give formula and modes).
fi


----
rm -f XDSCONV.INP temp1.hkl temp1.mtz temp2.hkl temp2.mtz F2MTZ.INP XDSCONV.LP</nowiki>


Hint for long-time XDSCONV users:
== Hint for long-time XDSCONV users ==


The latest versions of the program do not require  
The latest versions of the program do not require  
  SPACE_GROUP_NUMBER=
  SPACE_GROUP_NUMBER=
  UNIT_CELL_PARAMETERS=
  UNIT_CELL_PARAMETERS=
because these are picked up from the header of the input reflection file.
because these are picked up from the header of the input reflection file. However, if you want to ''change'' the parameters of either keyword then you have to specify '''both'''! I.e. if you want to change the spacegroup then you also have to specify the unit cell parameters.

Latest revision as of 14:31, 22 October 2019

XDSCONV is the conversion program of the XDS suite.

Possible output formats are SHELX, CNS, CCP4 (for F,SigF,DF,SigDF,isym), CCP4_F (for F,SigF,F(+),SigF(+),F(-),SigF(-)), CCP4_I (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-)) and CCP4_I+F (for IMEAN,SIGIMEAN,I(+),SIGI(+),I(-),SIGI(-),FP,SIGFP,F(+),SIGF(+),F(-),SIGF(-)) - the "+" and "-" varieties are only output if FRIEDEL'S_LAW=FALSE.


XDSCONV does outlier rejection in some modes.

Typical use

A typical input file XDSCONV.INP might look like

INPUT_FILE=XDS_ASCII.HKL
INCLUDE_RESOLUTION_RANGE=50 1  ! optional 
OUTPUT_FILE=temp.hkl  CCP4     ! Warning: do _not_ name this file "temp.mtz" !
FRIEDEL'S_LAW=FALSE            ! default is FRIEDEL'S_LAW=TRUE

This produces the file temp.hkl which is then converted to a MTZ file XDS_ASCII.mtz with (these lines are also printed out by XDSCONV):

f2mtz HKLOUT temp.mtz<F2MTZ.INP
cad HKLIN1 temp.mtz HKLOUT XDS_ASCII.mtz<<EOF
LABIN FILE 1 ALL
END
EOF

This latter step is not necessary for CNS and SHELX output formats, which are written directly by XDSCONV. For the CNS output format, one could use MERGE=FALSE to keep observations separate. For the SHELX output format, MERGE=FALSE is the default (I guess because George Sheldrick suggests that his programs, in particular XPREP, should be fed unmerged data. However I sometimes found that I obtain better SHELXD results with merging inside XDSCONV, using MERGE=TRUE).

N.B. It is good practice to always use FRIEDEL'S_LAW=FALSE - see Tips and Tricks.

how to change column labels

To have control over the column labels, one might want to modify the simple example above as:

f2mtz HKLOUT temp.mtz<F2MTZ.INP
cad HKLIN1 temp.mtz HKLOUT junk_xdsconv.mtz<<EOF
LABIN FILE 1 E1=FP E2=SIGFP E3=DANO E4=SIGDANO E5=ISYM
LABOUT FILE 1 E1=FP E2=SIGFP E3=DANO_sulf E4=SIGDANO_sulf E5=ISYM_sulf
END
EOF

ISYM column is important if you want to run SHARP afterwards.

In the case of a MTZ file that should be used for molecular replacement and refinement, the CAD step could be used to transfer the R_free flag from a different dataset to this new dataset. Alternatively, change of labels and transfer of columns can be done in the ccp4i GUI.

explanation of typical output

 
========== CONTROL CARDS ==========

 INPUT_FILE=XDS_ASCII.HKL
 OUTPUT_FILE=temp.hkl CCP4


 SPACE_GROUP_NUMBER=  199
 UNIT_CELL_CONSTANTS=    78.09    78.09    78.09  90.000  90.000  90.000
 FRIEDEL'S_LAW=FALSE
 MERGE=TRUE 
 NUMBER OF REFLECTION RECORDS ON INPUT FILE      217611      ! observations ("spots")
 NUMBER OF IGNORED REFLECTIONS (I< -3.0*SIGMA)        0      ! merged (unique) reflections, Friedels counted separately
 NUMBER OF REFLECTIONS ACCEPTED FROM INPUT FILE   23155      ! merged (unique) reflections, Friedels counted separately

 NUMBER OF UNIQUE REFLECTIONS ASSIGNED TO TEST SET        0
 NUMBER OF UNIQUE TEST REFLECTIONS INHERITED              0
 NUMBER OF UNIQUE TEST REFLECTIONS NEWLY GENERATED        0

 NUMBER OF REFLECTION RECORDS ON OUTPUT FILE      12264      ! merged (unique) reflections; a Friedel pair is counted as one reflection for the MTZ file
 NUMBER OF RECORDS ASSIGNED TO WORKING SET        12264      ! but since each unique reflection is stored with its anomalous signal no information is lost
 NUMBER OF RECORDS ASSIGNED TO TEST SET               0

Obviously, the meaning of the word "reflection" differs between the output lines; some explanation is given after the exclamation mark.

how to obtain a MTZ file with DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-)

You have to run XDSCONV twice, and combine the output with cad. At the latter step you can also change the column labels:

#!/bin/csh -f
# produce xds_allFinfo.mtz with FP SIGFP DANO SIGDANO F(+) SIGF(+) F(-) SIGF(-)
# in the same way, the labels produced with CCP4_I could be included!
#
# first xdsconv run producing FP SIGFP DANO SIGDANO
echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP
echo "OUTPUT_FILE= temp.hkl CCP4" >> XDSCONV.INP
echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP
xdsconv
f2mtz HKLOUT temp1.mtz<F2MTZ.INP

# second xdsconv run producing F(+) SIGF(+) F(-) SIGF(-)
echo "INPUT_FILE= XDS_ASCII.HKL" > XDSCONV.INP
echo "OUTPUT_FILE= temp.hkl CCP4_F" >> XDSCONV.INP
echo "FRIEDEL'S_LAW= FALSE" >> XDSCONV.INP
xdsconv
f2mtz HKLOUT temp2.mtz<F2MTZ.INP

# for CAD, the 2 LABOUT cards are only required if the labels should be changed
cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT xds_allFinfo.mtz<<EOF
 LABIN  FILE 1 E1=FP       E2=SIGFP       E3=DANO     E4=SIGDANO
 LABIN  FILE 2 E1=F(+)     E2=SIGF(+)     E3=F(-)     E4=SIGF(-)
 LABOUT FILE 1 E1=FP_Hg    E2=SIGFP_Hg    E3=DANO_Hg  E4=SIGDANO_Hg
 LABOUT FILE 2 E1=F(+)_Hg  E2=SIGF(+)_Hg  E3=F(-)_Hg  E4=SIGF(-)_Hg
 END
EOF

The following script does the same for the input file (first parameter to the script), but also adds a SUFFIX (second parameter) to the columns to better identify the data, and optionally copies the Rfree-flag from a reference mtz-file (third parameter). If the Rfree-flag is NOT named "FreeR_flag" (the default from ccp4i), you can provide its name as fourth parameter. All steps are logged into log-files, temporary files are deleted. The input file should end with .HKL (rather than e.g. .hkl). The script also sets the resolution to that of the observed data using sftools. Otherwise the resolution of the reference data set might be shown if that is higher. You can call this script 'xds2mtz.sh'. If it is executed without arguments, you get a short usage instruction.

#!/bin/bash

function usage {
echo "Usage: xds2mtz file.HKL SUFFIX [Rfree.mtz [RfreeFlag]]"
echo ""
echo "       file.HKL:  Output from XDS or XSCALE"
echo "       SUFFIX:    Columns suffix, e.g. FP_SUFFIX"
echo "       Rfree.mtz: Reference mtz-file for Rfree transfer"
echo "       RfreeFlag: Label for Rfree set, defaults to \"FreeR_flag\""
echo ""
}

if [ -z $1 ]; then
	echo "*** Error: Missing input XDS file name"
	usage
	exit -1;
fi
if [ ! -f $1 ]; then
	echo "*** Error: File $1 does not exist"
	usage
	exit -1;
fi


BASE=$(basename $1)
SUFFIX=$2
RFREE=$3
FLAG=$4

echo "Base = $BASE, Suffix = $SUFFIX"


echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp1.hkl CCP4" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp1.mtz <F2MTZ.INP | tee ${BASE%.HKL}_dano.log

echo "INPUT_FILE= $1" > XDSCONV.INP
echo "OUTPUT_FILE= temp2.hkl CCP4_F" >> XDSCONV.INP
xdsconv && f2mtz HKLOUT temp2.mtz <F2MTZ.INP |tee ${BASE%.HKL}_pm.log

if [ -z $3 ]; then
	echo "Proceeding without Rfree reference file"
cad HKLIN1 temp1.mtz HKLIN2 temp2.mtz HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN  FILE 1 E1=FP       E2=SIGFP       E3=DANO     E4=SIGDANO
LABIN  FILE 2 E1=F(+)     E2=SIGF(+)     E3=F(-)     E4=SIGF(-)
LABOUT FILE 1 E1=FP_$SUFFIX    E2=SIGFP_$SUFFIX    E3=DANO_$SUFFIX  E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX  E2=SIGF(+)_$SUFFIX  E3=F(-)_$SUFFIX  E4=SIGF(-)_$SUFFIX
eof
else 
	echo "Copying Rfree from file $3"
	if [ -z $4 ]; then 
		FREERFLAG="FreeR_flag" # ccp4 standard name
	else
	FREERFLAG=$4
	fi
	echo "Extracting flagged indices from ${FREERFLAG}"
	cad 	HKLIN1 temp1.mtz \
		HKLIN2 temp2.mtz \
		HKLIN3 $3 \
		HKLOUT ${BASE%.HKL}.mtz << eof | tee ${BASE%.HKL}_cad.log
LABIN  FILE 1 E1=FP       E2=SIGFP       E3=DANO     E4=SIGDANO
LABIN  FILE 2 E1=F(+)     E2=SIGF(+)     E3=F(-)     E4=SIGF(-)
LABIN  FILE 3 E1=${FREERFLAG}
LABOUT FILE 1 E1=FP_$SUFFIX    E2=SIGFP_$SUFFIX    E3=DANO_$SUFFIX  E4=SIGDANO_$SUFFIX
LABOUT FILE 2 E1=F(+)_$SUFFIX  E2=SIGF(+)_$SUFFIX  E3=F(-)_$SUFFIX  E4=SIGF(-)_$SUFFIX
LABOUT FILE 3 E1=${FREERFLAG}
eof

rm temp1.mtz

# correct for FreeRflag (if new file has more reflections than reference file)
freerflag hklin ${BASE%.HKL}.mtz hklout temp1.mtz << eof | tee ${BASE%.HKL}_freerflag.log
COMPLETE FREE=${FREERFLAG}
end
eof

# correct for real data in case Rfree data set contains too many hkls
# thanks to Andrey Lebedev
sftools << eof | tee ${BASE%.HKL}_sftools.log
READ ${BASE%.HKL}.mtz
SELECT ONLY COLUMN FP_$SUFFIX PRESENT
WRITE temp1.mtz
END
eof

mv temp1.mtz ${BASE%.HKL}.mtz

fi

rm -f XDSCONV.INP temp1.hkl temp1.mtz temp2.hkl temp2.mtz F2MTZ.INP XDSCONV.LP

Hint for long-time XDSCONV users

The latest versions of the program do not require

SPACE_GROUP_NUMBER=
UNIT_CELL_PARAMETERS=

because these are picked up from the header of the input reflection file. However, if you want to change the parameters of either keyword then you have to specify both! I.e. if you want to change the spacegroup then you also have to specify the unit cell parameters.