Xscale: Difference between revisions

(15 intermediate revisions by the same user not shown)

Line 1:

== Simple and advanced usage ==

[http://~~www~~.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xscale_parameters.html XSCALE] is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only ''needed'' if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is "scaled again" in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. ~~The~~ advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells.

[http://xds.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xscale_parameters.html XSCALE] is the stand-alone scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS ''already scales'' an individual dataset, XSCALE is only ''needed'' if several datasets should be scaled relative to another. However, it does not deterioriate (over-fit) a dataset if it is "scaled again" in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space.

One advantage of using XSCALE for a single dataset is that the user can specify the number and limits of the resolution shells. Another is that zero-dose extrapolation can be done.

At the XDS website, there is a short and a long commented example of [http://~~www~~.mpimf-heidelberg.mpg.de~~/~kabsch/xds~~/html_doc/INPUT_templates/XSCALE.INP XSCALE.INP]

At the XDS website, there is a short and a long commented example of [http://xds.mpimf-heidelberg.mpg.de/html_doc/INPUT_templates/XSCALE.INP XSCALE.INP]

----

Line 18:

Line 20:

FRIEDEL'S_LAW=FALSE

STRICT_ABSORPTION_CORRECTION=TRUE ! see XDSwiki:Tips_and_Tricks

INPUT_FILE= ../fae-rh/xds_2/XDS_ASCII.HKL

! the star in front of the file name indicates that it is the reference wrt falloff

INPUT_FILE= *../fae-rh/xds_2/XDS_ASCII.HKL

FRIEDEL'S_LAW=FALSE

STRICT_ABSORPTION_CORRECTION=TRUE

Line 32:

Line 35:

== Further keywords ==

* RESOLUTION_SHELLS= ! for the printout of R-factors, completeness, ...

* [http://xds.mpimf-heidelberg.mpg.de/html_doc/xscale_parameters.html#RESOLUTION_SHELLS= RESOLUTION_SHELLS=] ! for the printout of R-factors, completeness, ...

* SPACE_GROUP_NUMBER= ! if not given, picked up from first input reflection file

* UNIT_CELL_CONSTANTS= ! if not given, picked up from first input reflection file

Line 54:

Line 57:

* DOSE_RATE= ! (optional for radiation damage correction f.i.r.)

* 0-DOSE_SIGNIFICANCE_LEVEL= ! (optional for radiation damage correction f.i.r.)

* SAVE_CORRECTION_IMAGES= ! Default is TRUE. If FALSE, don't write DECAY*.cbf MODPIX*.cbf ABSORP*.cbf

== Radiation damage correction ==

Line 59:

Line 63:

=== based on resolution shell and frame number ===

The usual (like in ~~MOSFLM~~ and ~~other programs~~) ~~correction~~ based on resolution shell and frame number is performed in [[XDS]] as part of the CORRECT step - it can be switched off by omitting DECAY from the default CORRECTIONS= DECAY MODULATION ABSORP. DECAY correction is also the default in XSCALE.

The usual correction (like in AIMLESS and SCALEPACK) based on resolution shell and frame number is performed in [[XDS]] as part of the CORRECT step - it can be switched off by omitting DECAY from the default CORRECTIONS= DECAY MODULATION ABSORP. DECAY correction is also the default in XSCALE.

It is instructive to inspect DECAY.cbf (using "XDS-Viewer DECAY.cbf"). This visualizes the scale factors employed by the CORRECT step (the equivalent files from XSCALE are called DECAY_*.cbf); the right sidebar gives the mapping between shades of gray, and numbers (1000 corresponds to a scalefactor of 1). Along the horizontal axis the frame number (or rather the batch number) is shown, along the vertical axis the resolution shell.

=== for individual reflections ===

=== for individual reflections: zero-dose extrapolation ===

To "switch on" radiation damage correction of individual reflections ([http://dx.doi.org/10.1107/S0907444903006516 K. Diederichs, S. McSweeney and R. B. G. Ravelli (2003) Zero-dose extrapolation as part of macromolecular synchrotron data reduction. ''Acta Cryst.'' '''D59''', 903-909]) it suffices to use the CRYSTAL_NAME keyword. '''The CRYSTAL_NAME parameters of different datasets do not have to be different'''. If they are different, this results in more degrees of freedom (namely, the slopes of the reflection intensity as a function of dose) for the program to fit the observed changes of intensities which are induced by radiation damage. However, if the datasets are based on the same crystal, or the datasets are based on crystals from the same drop, it is reasonable to assume that the slopes are the same.

Line 94:

Line 98:

CRYSTAL_NAME=Hg

STARTING_DOSE=-22.* ! assuming the dataset has 100 frames

Explanation: the interpolation is done towards 0, and by defining the start of the dataset to be at -22., one tells the program to calculate (by interpolation) ~~those~~ intensity values that would be obtained at dose 0 which in reality is near frame 22.

Explanation: the interpolation is done towards 0, and by defining the start of the dataset to be at -22., one tells the program to calculate (by interpolation) intensity values that would be obtained at dose 0 which in reality is near frame 22.

Another example: by defining STARTING_DOSE=-78.* one would tell the program to calculate, by interpolation, those intensity values that correspond to those that would be obtained near frame 78.

~~== Scaling many datasets ==~~

The program has no internal limit for the number of datasets. However, many items are calculated for ''each pair of datasets''. This results in some component of the CPU time being quadratic in the number of datasets. Beyond about 100 datasets, this component dominates, and calculation times become ''really long'' (hours to days). However, it is possible to approach the problem in an ''incremental'' way:

~~# Scale the first (say) 100 datasets together, including any low-resolution datasets~~

~~# For the next step, use the OUTPUT_FILE from the previous step as the first INPUT_FILE (with WFAC1=2), and the next (say) 99 datasets as further INPUT_FILEs.~~

~~# Go to step 2. as long as you have more datasets~~

~~This should keep the wallclock requirements reasonable~~, ~~and simply keeps adding datasets to~~ an ~~ever-growing merged dataset~~. ~~The nice thing is~~ that ~~you can monitor how the completeness keeps growing with each step, and once good completeness is obtained, CC1/2 should start growing~~.

== Scaling many datasets ==

When scaling e.g. hundreds of partial datasets, XSCALE may finish with an error message !!! ERROR !!! INSUFFICIENT NUMBER OF COMMON STRONG REFLECTIONS . This usually indicates that one or more datasets have too few reflections. Please inspect the table

~~Example showing~~ the ~~principle:~~

~~# first step~~

DATA MEAN REFLECTIONS INPUT FILE NAME

~~cat <<EOF>XSCALE.INP~~

SET# INTENSITY ACCEPTED REJECTED

~~OUTPUT_FILE=1to100.ahkl~~

~~INPUT_FILE=../1/XDS_ASCII.HKL~~

~~INPUT_FILE=../2/XDS_ASCII.HKL~~

~~! insert lines for INPUT_FILEs 3..100~~

~~EOF~~

~~xscale_par~~

~~mv XSCALE.LP XSCALE_1to100.LP~~

~~# second step~~

~~cat <<EOF>XSCALE.INP~~

~~OUTPUT_FILE=1to200.ahkl~~

~~INPUT_FILE=1to100.ahkl~~

~~WFAC1=2~~ ~~! avoid rejecting more outliers~~

~~INPUT_FILE=../101/XDS_ASCII.HKL~~

~~INPUT_FILE=../102/XDS_ASCII.HKL~~

~~! insert lines for INPUT_FILEs 103..200~~

~~EOF~~

~~xscale_par~~

~~mv XSCALE.LP XSCALE_1to200.LP~~

# ~~third step~~

~~cat <<EOF>XSCALE.INP~~

~~OUTPUT_FILE=1to300.ahkl~~

~~INPUT_FILE=1to200.ahkl~~

~~WFAC1=2 ! avoid rejecting more outliers~~

~~INPUT_FILE=../201/XDS_ASCII.HKL~~

~~INPUT_FILE=../202/XDS_ASCII.HKL~~

~~! insert lines for INPUT_FILEs 203..300~~

~~EOF~~

~~xscale_par~~

~~mv XSCALE.LP XSCALE_1to300.LP~~

~~...~~

</nowiki>

and check the column "ACCEPTED REFLECTIONS". Then remove the dataset(s) with fewest accepted reflections, and re-run the program. Repeat if necessary.

~~== A hint~~ for ~~long-time XSCALE users ==~~

The latest XSCALE (March 1, 2015) makes it explicit which dataset(s) it cannot scale; it prints out e.g. "no common reflections with data set 197". If you get this message for many datasets, I suggest to have a line

MINIMUM_I/SIGMA=2 ! reduce to 1, or 0.5, or 0.25, or 0.125, or ... to lower the cutoff

a~~) The latest versions do not require~~

after each INPUT_FILE= line, to increase the number of reflections available for scaling. However, MINIMUM_I/SIGMA= should not be decreased needlessly below its default of 3.

~~SPACE_GROUP_NUMBER~~=

~~UNIT_CELL_PARAMETERS~~=

~~in XSCALE~~.~~INP because these parameters are picked up from the header~~ of ~~the first input reflection file~~.

b) The ~~[[VIEW]] program was replaced~~ with ~~the [[XDS-Viewer]] program~~.

Old versions of XSCALE may also finish with the error message !!! ERROR !!! INACCURATE SCALING FACTORS. This usually indicates that one or more datasets are linearly depending on others (this happens if the ''same'' data are included more than once as INPUT_FILE), or are pure noise. The latest version of XSCALE (March 1, 2015) copes much better with this situation; I have not seen this error message any more.

@@ Line 1: / Line 1: @@
 == Simple and advanced usage ==
-[http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xscale_parameters.html XSCALE] is the scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS already scales an individual dataset, XSCALE is only ''needed'' if several datasets should be scaled relative to another. However, it does not deterioriate a dataset if it is "scaled again" in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space. The advantage of using XSCALE for a single dataset is that the user can specify the limits of the resolution shells.
+[http://xds.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/xscale_parameters.html XSCALE] is the stand-alone scaling program of the XDS suite. It scales reflection files (typically called XDS_ASCII.HKL) produced by XDS. Since the CORRECT step of XDS ''already scales'' an individual dataset, XSCALE is only ''needed'' if several datasets should be scaled relative to another. However, it does not deterioriate (over-fit) a dataset if it is "scaled again" in XSCALE, since the supporting points of the scalefactors are at the same positions in detector and batch space.
+One advantage of using XSCALE for a single dataset is that the user can specify the number and limits of the resolution shells. Another is that zero-dose extrapolation can be done.
-At the XDS website, there is a short and a long commented example of [http://www.mpimf-heidelberg.mpg.de/~kabsch/xds/html_doc/INPUT_templates/XSCALE.INP XSCALE.INP]
+At the XDS website, there is a short and a long commented example of [http://xds.mpimf-heidelberg.mpg.de/html_doc/INPUT_templates/XSCALE.INP XSCALE.INP]
 ----
@@ Line 18: / Line 20: @@
   FRIEDEL'S_LAW=FALSE
   STRICT_ABSORPTION_CORRECTION=TRUE         ! see XDSwiki:Tips_and_Tricks
-  INPUT_FILE= ../fae-rh/xds_2/XDS_ASCII.HKL
+! the star in front of the file name indicates that it is the reference wrt falloff
+  INPUT_FILE= *../fae-rh/xds_2/XDS_ASCII.HKL
   FRIEDEL'S_LAW=FALSE
   STRICT_ABSORPTION_CORRECTION=TRUE
@@ Line 32: / Line 35: @@
 == Further keywords ==
-* RESOLUTION_SHELLS=            ! for the printout of R-factors, completeness, ...
+* [http://xds.mpimf-heidelberg.mpg.de/html_doc/xscale_parameters.html#RESOLUTION_SHELLS= RESOLUTION_SHELLS=]            ! for the printout of R-factors, completeness, ...
 * SPACE_GROUP_NUMBER=           ! if not given, picked up from first input reflection file
 * UNIT_CELL_CONSTANTS=          ! if not given, picked up from first input reflection file
@@ Line 54: / Line 57: @@
 * DOSE_RATE=                     ! (optional for radiation damage correction f.i.r.)
 * 0-DOSE_SIGNIFICANCE_LEVEL=     ! (optional for radiation damage correction f.i.r.)
+* SAVE_CORRECTION_IMAGES=        ! Default is TRUE. If FALSE, don't write DECAY*.cbf MODPIX*.cbf ABSORP*.cbf
 == Radiation damage correction ==
@@ Line 59: / Line 63: @@
 === based on resolution shell and frame number ===
-The usual (like in MOSFLM and other programs) correction based on resolution shell and frame number is performed in [[XDS]] as part of the CORRECT step - it can be switched off by omitting DECAY from the default CORRECTIONS= DECAY MODULATION ABSORP. DECAY correction is also the default in XSCALE.
+The usual correction (like in AIMLESS and SCALEPACK) based on resolution shell and frame number is performed in [[XDS]] as part of the CORRECT step - it can be switched off by omitting DECAY from the default CORRECTIONS= DECAY MODULATION ABSORP. DECAY correction is also the default in XSCALE.
 It is instructive to inspect DECAY.cbf (using "XDS-Viewer DECAY.cbf"). This visualizes the scale factors employed by the CORRECT step (the equivalent files from XSCALE are called DECAY_*.cbf); the right sidebar gives the mapping between shades of gray, and numbers (1000 corresponds to a scalefactor of 1). Along the horizontal axis the frame number (or rather the batch number) is shown, along the vertical axis the resolution shell.
-=== for individual reflections ===
+=== for individual reflections: zero-dose extrapolation ===
 To "switch on" radiation damage correction of individual reflections ([http://dx.doi.org/10.1107/S0907444903006516 K. Diederichs, S. McSweeney and R. B. G. Ravelli (2003) Zero-dose extrapolation as part of macromolecular synchrotron data reduction. ''Acta Cryst.'' '''D59''', 903-909]) it suffices to use the CRYSTAL_NAME keyword. '''The CRYSTAL_NAME parameters of different datasets do not have to be different'''. If they are different, this results in more degrees of freedom (namely, the slopes of the reflection intensity as a function of dose) for the program to fit the observed changes of intensities which are induced by radiation damage. However, if the datasets are based on the same crystal, or the datasets are based on crystals from the same drop, it is reasonable to assume that the slopes are the same.
@@ Line 94: / Line 98: @@
     CRYSTAL_NAME=Hg
     STARTING_DOSE=-22.*   ! assuming the dataset has 100 frames
-Explanation: the interpolation is done towards 0, and by defining the start of the dataset to be at -22., one tells the program to calculate (by interpolation) those intensity values that would be obtained at dose 0 which in reality is near frame 22.
+Explanation: the interpolation is done towards 0, and by defining the start of the dataset to be at -22., one tells the program to calculate (by interpolation) intensity values that would be obtained at dose 0 which in reality is near frame 22.
 Another example: by defining STARTING_DOSE=-78.* one would tell the program to calculate, by interpolation, those intensity values that correspond to those that would be obtained near frame 78.
-== Scaling many datasets ==
-The program has no internal limit for the number of datasets. However, many items are calculated for ''each pair of datasets''. This results in some component of the CPU time being quadratic in the number of datasets. Beyond about 100 datasets, this component dominates, and calculation times become ''really long'' (hours to days). However, it is possible to approach the problem in an ''incremental'' way:
-# Scale the first (say) 100 datasets together, including any low-resolution datasets
-# For the next step, use the OUTPUT_FILE from the previous step as the first INPUT_FILE (with WFAC1=2), and the next (say) 99 datasets as further INPUT_FILEs.
-# Go to step 2. as long as you have more datasets
-This should keep the wallclock requirements reasonable, and simply keeps adding datasets to an ever-growing merged dataset. The nice thing is that you can monitor how the completeness keeps growing with each step, and once good completeness is obtained, CC1/2 should start growing.
+== Scaling many datasets ==
+When scaling e.g. hundreds of partial datasets, XSCALE may finish with an error message !!! ERROR !!! INSUFFICIENT NUMBER OF COMMON STRONG REFLECTIONS . This usually indicates that one or more datasets have too few reflections. Please inspect the table
-Example showing the principle:
   <nowiki>
-# first step
+DATA    MEAN       REFLECTIONS        INPUT FILE NAME
-cat <<EOF>XSCALE.INP
+  SET# INTENSITY  ACCEPTED REJECTED
-OUTPUT_FILE=1to100.ahkl
-INPUT_FILE=../1/XDS_ASCII.HKL
-INPUT_FILE=../2/XDS_ASCII.HKL
-! insert lines for INPUT_FILEs 3..100
-EOF
-xscale_par
-mv XSCALE.LP XSCALE_1to100.LP
-# second step
-cat <<EOF>XSCALE.INP
-OUTPUT_FILE=1to200.ahkl
-INPUT_FILE=1to100.ahkl
-WFAC1=2    ! avoid rejecting more outliers
-INPUT_FILE=../101/XDS_ASCII.HKL
-INPUT_FILE=../102/XDS_ASCII.HKL
-! insert lines for INPUT_FILEs 103..200
-EOF
-xscale_par
-mv XSCALE.LP XSCALE_1to200.LP
-# third step
-cat <<EOF>XSCALE.INP
-OUTPUT_FILE=1to300.ahkl
-INPUT_FILE=1to200.ahkl
-WFAC1=2   ! avoid rejecting more outliers
-INPUT_FILE=../201/XDS_ASCII.HKL
-INPUT_FILE=../202/XDS_ASCII.HKL
-! insert lines for INPUT_FILEs 203..300
-EOF
-xscale_par
-mv XSCALE.LP XSCALE_1to300.LP
-  ...
   </nowiki>
+and check the column "ACCEPTED REFLECTIONS". Then remove the dataset(s) with fewest accepted reflections, and re-run the program. Repeat if necessary.
-== A hint for long-time XSCALE users ==
+The latest XSCALE (March 1, 2015) makes it explicit which dataset(s) it cannot scale; it prints out e.g. "no common reflections with data set          197". If you get this message for many datasets, I suggest to have a line
+  MINIMUM_I/SIGMA=2 ! reduce to 1, or 0.5, or 0.25, or 0.125, or ... to lower the cutoff
-a) The latest versions do not require
+after each INPUT_FILE= line, to increase the number of reflections available for scaling. However, MINIMUM_I/SIGMA= should not be decreased needlessly below its default of 3.
-  SPACE_GROUP_NUMBER=
- UNIT_CELL_PARAMETERS=
-in XSCALE.INP because these parameters are picked up from the header of the first input reflection file.
-b) The [[VIEW]] program was replaced with the [[XDS-Viewer]] program.
+Old versions of XSCALE may also finish with the error message !!! ERROR !!! INACCURATE SCALING FACTORS. This usually indicates that one or more datasets are linearly depending on others (this happens if the ''same'' data are included more than once as INPUT_FILE), or are pure noise. The latest version of XSCALE (March 1, 2015) copes much better with this situation; I have not seen this error message any more.