Xscale: Difference between revisions
Line 100: | Line 100: | ||
== Scaling many datasets == | == Scaling many datasets == | ||
The program has no internal limit for the number of datasets. However, many items are calculated for ''each pair of datasets''. This results in some component of the CPU time being quadratic in the number of datasets. | The program has no internal limit for the number of datasets. However, many items are calculated for ''each pair of datasets''. This results in some component of the CPU time being quadratic in the number of datasets. Nevertheless, it is perfectly possible to scale >1000 datasets. | ||
If this is too slow, one could use an ''incremental'' way: | |||
# Scale the first (say) 100 datasets together, including any low-resolution datasets | # Scale the first (say) 100 datasets together, including any low-resolution datasets | ||
Line 106: | Line 108: | ||
# Go to step 2. as long as you have more datasets | # Go to step 2. as long as you have more datasets | ||
This should | This should reduce wallclock requirements, and simply keeps adding datasets to an ever-growing merged dataset. The nice thing is that you can monitor how the completeness keeps growing with each step, and once good completeness is obtained, CC1/2 should start growing. | ||
Example showing the principle: | Example showing the principle: | ||
Line 145: | Line 147: | ||
... | ... | ||
</nowiki> | </nowiki> | ||
=== Possible problems in scaling many datasets === | |||
XSCALE may finish with an error message !!! ERROR !!! INSUFFICIENT NUMBER OF COMMON STRONG REFLECTIONS . This usually indicates that one or more datasets have too few reflections. Please inspect the table | |||
<nowiki> | |||
DATA MEAN REFLECTIONS INPUT FILE NAME | |||
SET# INTENSITY ACCEPTED REJECTED | |||
</nowiki> | |||
and check the column "ACCEPTED REFLECTIONS". Then remove the dataset(s) with fewest accepted reflections, and re-run the program. Repeat if necessary. | |||
XSCALE may also finish with the error message !!! ERROR !!! INACCURATE SCALING FACTORS. This usually indicates that one or more datasets are linearly depending on others (this happens if the ''same'' data are included more than once as INPUT_FILE), or are pure noise. I have an experimental version of XSCALE that prints out the numbers of these bad datasets. | |||
== A hint for long-time XSCALE users == | == A hint for long-time XSCALE users == |
Revision as of 15:24, 11 November 2014
Simple and advanced usage
XSCALE is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only needed if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is "scaled again" in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells.
At the XDS website, there is a short and a long commented example of XSCALE.INP
A minimal input file to combine two datasets into one file is:
OUTPUT_FILE=fae-native.ahkl INPUT_FILE= ../fae-native/xds_1/XDS_ASCII.HKL INPUT_FILE= ../fae-native/xds_2/XDS_ASCII.HKL
Several output files can be specified (together with their set of input files) in a single run of XSCALE, simply by concatenation of sections like the above. All output files are then on the same scale - a program feature recommended for MAD data sets:
OUTPUT_FILE=fae-rh.ahkl INPUT_FILE= ../fae-rh/xds_1/XDS_ASCII.HKL FRIEDEL'S_LAW=FALSE STRICT_ABSORPTION_CORRECTION=TRUE ! see XDSwiki:Tips_and_Tricks INPUT_FILE= ../fae-rh/xds_2/XDS_ASCII.HKL FRIEDEL'S_LAW=FALSE STRICT_ABSORPTION_CORRECTION=TRUE OUTPUT_FILE=fae-ip.ahkl INPUT_FILE= ../fae-ip/xds_1/XDS_ASCII.HKL FRIEDEL'S_LAW=FALSE STRICT_ABSORPTION_CORRECTION=TRUE INPUT_FILE= ../fae-ip/xds_2/XDS_ASCII.HKL FRIEDEL'S_LAW=FALSE STRICT_ABSORPTION_CORRECTION=TRUE
Further keywords
- RESOLUTION_SHELLS= ! for the printout of R-factors, completeness, ...
- SPACE_GROUP_NUMBER= ! if not given, picked up from first input reflection file
- UNIT_CELL_CONSTANTS= ! if not given, picked up from first input reflection file
keywords with the same meaning as in CORRECT
- REIDX=
- REFERENCE_DATA_SET= ! see also REFERENCE_DATA_SET
- MINIMUM_I/SIGMA=
- REFLECTIONS/CORRECTION_FACTOR=
- FRIEDEL'S_LAW=
- STRICT_ABSORPTION_CORRECTION=
- INCLUDE_RESOLUTION_RANGE=
- MAXIMUM_NUMBER_OF_PROCESSORS=
- CORRECTIONS=
- NBATCH=
keywords unique to XSCALE
- MERGE= ! average intensities from all input files, applies to output file
- WEIGHT= ! applies to input file
- CRYSTAL_NAME= ! switch on radiation damage correction for individual reflections (f.i.r.)
- STARTING_DOSE= ! (optional for radiation damage correction f.i.r.)
- DOSE_RATE= ! (optional for radiation damage correction f.i.r.)
- 0-DOSE_SIGNIFICANCE_LEVEL= ! (optional for radiation damage correction f.i.r.)
Radiation damage correction
based on resolution shell and frame number
The usual (like in MOSFLM and other programs) correction based on resolution shell and frame number is performed in XDS as part of the CORRECT step - it can be switched off by omitting DECAY from the default CORRECTIONS= DECAY MODULATION ABSORP. DECAY correction is also the default in XSCALE.
It is instructive to inspect DECAY.cbf (using "XDS-Viewer DECAY.cbf"). This visualizes the scale factors employed by the CORRECT step (the equivalent files from XSCALE are called DECAY_*.cbf); the right sidebar gives the mapping between shades of gray, and numbers (1000 corresponds to a scalefactor of 1). Along the horizontal axis the frame number (or rather the batch number) is shown, along the vertical axis the resolution shell.
for individual reflections
To "switch on" radiation damage correction of individual reflections (K. Diederichs, S. McSweeney and R. B. G. Ravelli (2003) Zero-dose extrapolation as part of macromolecular synchrotron data reduction. Acta Cryst. D59, 903-909) it suffices to use the CRYSTAL_NAME keyword. The CRYSTAL_NAME parameters of different datasets do not have to be different. If they are different, this results in more degrees of freedom (namely, the slopes of the reflection intensity as a function of dose) for the program to fit the observed changes of intensities which are induced by radiation damage. However, if the datasets are based on the same crystal, or the datasets are based on crystals from the same drop, it is reasonable to assume that the slopes are the same. Example:
OUTPUT_FILE=fae-merge.ahkl INPUT_FILE= ../fae-ip/xds_1/XDS_ASCII.HKL ! CRYSTAL_NAME=ip INPUT_FILE= ../fae-ip/xds_2/XDS_ASCII.HKL ! same crystal, but translated along z CRYSTAL_NAME=ip
This is the recommended way as it reduces overfitting.
If, however, the crystals represent different heavy atom soaks, it is advisable to give a different CRYSTAL_NAME to each dataset. Example:
OUTPUT_FILE=hg.ahkl INPUT_FILE= ../xds-hg/XDS_ASCII.HKL ! a mercury soak CRYSTAL_NAME=Hg OUTPUT_FILE=pt.ahkl INPUT_FILE= ../xds-pt/XDS_ASCII.HKL ! a platinum soak CRYSTAL_NAME=Pt
A word of warning: even if the internal quality indicators (R-factors) are better when using this feature, there is no guarantee that the resulting intensities will actually be better suited for your purposes than those obtained without it. In particular, extrapolating to the ends of the dose interval (0 dose and full dose) decreases the precision of the intensities.
Optimal values of dose, for interpolation
The optimal points for interpolation are near 1/4 and near 3/4 of the total dose. Details are published in Diederichs, K., Junk, M. (2009) Post-processing intensity measurements at favourable dose values. J. Appl. Cryst. 42, 48-57.
To interpolate to 22% of the full dose, one has to give a STARTING_DOSE less than zero:
OUTPUT_FILE=hg.ahkl INPUT_FILE= ../xds-hg/XDS_ASCII.HKL ! a mercury soak CRYSTAL_NAME=Hg STARTING_DOSE=-22.* ! assuming the dataset has 100 frames
Explanation: the interpolation is done towards 0, and by defining the start of the dataset to be at -22., one tells the program to calculate (by interpolation) intensity values that would be obtained at dose 0 which in reality is near frame 22.
Another example: by defining STARTING_DOSE=-78.* one would tell the program to calculate, by interpolation, those intensity values that correspond to those that would be obtained near frame 78.
Scaling many datasets
The program has no internal limit for the number of datasets. However, many items are calculated for each pair of datasets. This results in some component of the CPU time being quadratic in the number of datasets. Nevertheless, it is perfectly possible to scale >1000 datasets.
If this is too slow, one could use an incremental way:
- Scale the first (say) 100 datasets together, including any low-resolution datasets
- For the next step, use the OUTPUT_FILE from the previous step as the first INPUT_FILE (with WFAC1=2), and the next (say) 99 datasets as further INPUT_FILEs.
- Go to step 2. as long as you have more datasets
This should reduce wallclock requirements, and simply keeps adding datasets to an ever-growing merged dataset. The nice thing is that you can monitor how the completeness keeps growing with each step, and once good completeness is obtained, CC1/2 should start growing.
Example showing the principle:
# first step cat <<EOF>XSCALE.INP OUTPUT_FILE=1to100.ahkl INPUT_FILE=../1/XDS_ASCII.HKL INPUT_FILE=../2/XDS_ASCII.HKL ! insert lines for INPUT_FILEs 3..100 EOF xscale_par mv XSCALE.LP XSCALE_1to100.LP # second step cat <<EOF>XSCALE.INP OUTPUT_FILE=1to200.ahkl INPUT_FILE=1to100.ahkl WFAC1=2 ! avoid rejecting more outliers INPUT_FILE=../101/XDS_ASCII.HKL INPUT_FILE=../102/XDS_ASCII.HKL ! insert lines for INPUT_FILEs 103..200 EOF xscale_par mv XSCALE.LP XSCALE_1to200.LP # third step cat <<EOF>XSCALE.INP OUTPUT_FILE=1to300.ahkl INPUT_FILE=1to200.ahkl WFAC1=2 ! avoid rejecting more outliers INPUT_FILE=../201/XDS_ASCII.HKL INPUT_FILE=../202/XDS_ASCII.HKL ! insert lines for INPUT_FILEs 203..300 EOF xscale_par mv XSCALE.LP XSCALE_1to300.LP ...
Possible problems in scaling many datasets
XSCALE may finish with an error message !!! ERROR !!! INSUFFICIENT NUMBER OF COMMON STRONG REFLECTIONS . This usually indicates that one or more datasets have too few reflections. Please inspect the table
DATA MEAN REFLECTIONS INPUT FILE NAME SET# INTENSITY ACCEPTED REJECTED
and check the column "ACCEPTED REFLECTIONS". Then remove the dataset(s) with fewest accepted reflections, and re-run the program. Repeat if necessary.
XSCALE may also finish with the error message !!! ERROR !!! INACCURATE SCALING FACTORS. This usually indicates that one or more datasets are linearly depending on others (this happens if the same data are included more than once as INPUT_FILE), or are pure noise. I have an experimental version of XSCALE that prints out the numbers of these bad datasets.
A hint for long-time XSCALE users
a) The latest versions do not require
SPACE_GROUP_NUMBER= UNIT_CELL_PARAMETERS=
in XSCALE.INP because these parameters are picked up from the header of the first input reflection file.
b) The VIEW program was replaced with the XDS-Viewer program.