Xdsstat: Difference between revisions

From XDSwiki
Jump to navigation Jump to search
No edit summary
(→‎Tables: explain R_d plot better)
Tags: Mobile edit Mobile web edit Advanced mobile edit
 
(41 intermediate revisions by 2 users not shown)
Line 1: Line 1:
XDSSTAT is a home-brewn program that prints various statistics (that are not available from XDS itself) in the form of tables and images.
XDSSTAT is a home-brewn program that prints various statistics (that are not available from [[XDS]] itself) in the form of tables and images.


== Usage ==
== Usage ==
The program reads from a file (default: XDS_ASCII.HKL) written by CORRECT or XSCALE (MERGE=FALSE). Before using the program, you have to set up a CCP4 environment, because it uses the CCP4 routines and files.


The program may be called with two parameters which define the resolution range of data to be read from a file written by CORRECT or XSCALE (usually XDS_ASCII.HKL). As the output is long, it should be called as  
As the output is long, it should be called as  
  xdsstat > XDSSTAT.LP
  xdsstat > XDSSTAT.LP
This will probably soon be changed to the XDS style.
This might eventually be changed to the XDS style.
 
The program may be called with two parameters which define the resolution range of data to be read:
xdsstat 20 3 > XDSSTAT.LP
 
Visualization of part of the tabular output (tables 1 and 2) may be done with the CCP4 program [http://www.ccp4.ac.uk/html/loggraph.html loggraph].


== Features ==
== Features ==
Line 11: Line 17:
=== Tables ===
=== Tables ===
   
   
* statistics (R-factors, misfits, unique reflections, correlation ...) for each frame. These lines end with " L" which may be used for grepping these lines from XDSSTAT.LP.
* statistics: for each frame: # reflections, # misfits, I, sigma(I), I/sigma(I), fraction of reflections observed, correlation with standard profiles, R_meas and # reflections used for R_meas, # unique reflections which ''only'' occur on this frame and would thus be lost if this frame were deleted from the dataset. These lines end with " L" which may be used for "grepping" them from XDSSTAT.LP
* R-factors as a function of frame number difference (R<math>_d</math>, see [http://strucbio.biologie.uni-konstanz.de/strucbio/files/Diederichs_ActaD62_96.pdf Diederichs K. (2006) Some aspects of quantitative analysis and correction of radiation damage. Acta Cryst D62, 96-101]). These lines end with " DIFFERENCE" which may be used for grepping these lines from XDSSTAT.LP.
grep ' L$' XDSSTAT.LP > L
* R-factors as a function of frame number difference (R<math>_d</math> , see [https://www.biologie.uni-konstanz.de/securedl/sdl-eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2ODEyNDc2OTgsImV4cCI6MTY4MTkzODg5OCwidXNlciI6MCwiZ3JvdXBzIjpbMCwtMV0sImZpbGUiOiJmaWxlYWRtaW4vYmlvbG9naWUvYWctZGllZGVyaWNocy9wZGZzL0RpZWRlcmljaHMyMDA2X0FjdGFDcnlzdEQucGRmIiwicGFnZSI6ODI4MTV9.qLxfuLL5h1TX0knHEiArN01YRaWbfyVewDl6bv-iHdk/Diederichs2006_ActaCrystD.pdf Diederichs K. (2006) Some aspects of quantitative analysis and correction of radiation damage. Acta Cryst D62, 96-101]). These lines end with " DIFFERENCE" which may be used for "grepping" them from XDSSTAT.LP:
grep DIFFERENCE XDSSTAT.LP > D
The red line is a fit to the data, and the green line marks sqrt(2)*(R_d at zero dose). At a dose (frame number) where R_d is sqrt(2) times higher than at the beginning (i.e. where the red line intersects the green line), radiation damage is the dominant source of error. It appears sensible to discard frames beyond this point.
* R_meas as a function of the percentage of expected profile available for [[INTEGRATE|integration]] ("PEAK"), and logarithm of intensity. This table is only available for reflection files written by XDS; the information needed for the table is not in the files written by XSCALE. This table is most relevant for high-mosaicity datasets, and for datasets with few frames. <br> A bit more explanation: the number PEAK is the same as partiality. For example, reflections with PEAK of 75 are "3/4 fullies". These are "scaled up" by CORRECT, for example reflections with PEAK=75 are simply multiplied by 4/3 to recover the "full" intensity (which is written to XDS_ASCII.HKL after scaling). The same scaling-up is done for the sigmas of the reflections.<br> Of course, the PEAK value is itself a bit uncertain, and this uncertainty should in principle be taken into account when scaling-up the sigmas. This is not done since the uncertainty of PEAK is unknown.<br> The table gives (by rows), for values of PEAK from MINPK to 100, the R_meas of the reflections with that value of PEAK. Weak reflections are in the leftmost columns, and the strongest reflections are in the rightmost colums. From column to column the cutoff rises by a factor of 2. The next line then reports the number of reflections of that PEAK and intensity.<br> The idea is that e.g. if you see that strong reflections at PEAK=75 give bad R_meas values, but reflections of the same intensity (same column) give good R_meas values starting at PEAK=80, then you should/can raise MINPK to 80.
 
The first two of these tables may be visualized with [http://www.ccp4.ac.uk/html/loggraph.html loggraph].


=== Images ===
=== Images ===
The following quantities mapped onto the detector surface:  
The following quantities mapped onto the detector surface:  
* misfits.pck: outliers identified in CORRECT. Useful to identify [[ice rings]].  
* misfits.pck: outliers identified in CORRECT. Useful e.g. to identify [[ice rings]].  
* rf.pck: R-factor
* corr.pck: correlation with standard profiles
* anom.pck: anomalous signal
* peaks.pck: completeness of profiles
* rf.pck: R-factor (very interesting)
* anom.pck: anomalous signal (very interesting)
* scales.pck: intensity ratios between symmetry-related reflections, after scaling (very interesting)
* nobs.pck: observed reflections (not very interesting)
* nobs.pck: observed reflections (not very interesting)
* rlps.pck: reciprocal lorentz factor (not very interesting)
* rlps.pck: reciprocal lorentz factor (not very interesting)
These images are in the .pck format and may be visualized by VIEW.
These images are in the .pck format and may be visualized with the (obsolete) [[VIEW]] program; it should also be possible to use [[XDS-Viewer]].
 
N.B.: Images are currently only produced for reflection files coming from XDS, ''not'' for those from XSCALE. The reason is that the latter lack a line like
!NX=  2048  NY=  2048    QX=  0.079090  QY=  0.079090
which is used to tell the program how big the detector is (only NX and NY are actually required). If you want image output for XSCALE reflection files, just copy this line into it.


== Availability ==
== Availability ==


For now please send email to Kay dot Diederichs at uni-konstanz dot de. The binary program will soon be put on my webpages.
[https://{{SERVERNAME}}/pub/linux_bin/xdsstat Linux] and [https://{{SERVERNAME}}/pub/mac_bin/xdsstat Mac] binaries are available.
 
I (Kay dot Diederichs at uni-konstanz dot de) appreciate feedback.
The source code is in Fortran90 and requires a Fortran90-compiled CCP4 library, so few people are currently in a position to compile and link the program. I might opensource it someday, but preferably as part of the XDS distribution.
The source code is in Fortran90 and requires a Fortran90-compiled CCP4 library, so few people are currently in a position to compile and link the program.

Latest revision as of 20:36, 26 May 2023

XDSSTAT is a home-brewn program that prints various statistics (that are not available from XDS itself) in the form of tables and images.

Usage

The program reads from a file (default: XDS_ASCII.HKL) written by CORRECT or XSCALE (MERGE=FALSE). Before using the program, you have to set up a CCP4 environment, because it uses the CCP4 routines and files.

As the output is long, it should be called as

xdsstat > XDSSTAT.LP

This might eventually be changed to the XDS style.

The program may be called with two parameters which define the resolution range of data to be read:

xdsstat 20 3 > XDSSTAT.LP

Visualization of part of the tabular output (tables 1 and 2) may be done with the CCP4 program loggraph.

Features

Tables

  • statistics: for each frame: # reflections, # misfits, I, sigma(I), I/sigma(I), fraction of reflections observed, correlation with standard profiles, R_meas and # reflections used for R_meas, # unique reflections which only occur on this frame and would thus be lost if this frame were deleted from the dataset. These lines end with " L" which may be used for "grepping" them from XDSSTAT.LP:
grep ' L$' XDSSTAT.LP > L
grep DIFFERENCE XDSSTAT.LP > D

The red line is a fit to the data, and the green line marks sqrt(2)*(R_d at zero dose). At a dose (frame number) where R_d is sqrt(2) times higher than at the beginning (i.e. where the red line intersects the green line), radiation damage is the dominant source of error. It appears sensible to discard frames beyond this point.

  • R_meas as a function of the percentage of expected profile available for integration ("PEAK"), and logarithm of intensity. This table is only available for reflection files written by XDS; the information needed for the table is not in the files written by XSCALE. This table is most relevant for high-mosaicity datasets, and for datasets with few frames.
    A bit more explanation: the number PEAK is the same as partiality. For example, reflections with PEAK of 75 are "3/4 fullies". These are "scaled up" by CORRECT, for example reflections with PEAK=75 are simply multiplied by 4/3 to recover the "full" intensity (which is written to XDS_ASCII.HKL after scaling). The same scaling-up is done for the sigmas of the reflections.
    Of course, the PEAK value is itself a bit uncertain, and this uncertainty should in principle be taken into account when scaling-up the sigmas. This is not done since the uncertainty of PEAK is unknown.
    The table gives (by rows), for values of PEAK from MINPK to 100, the R_meas of the reflections with that value of PEAK. Weak reflections are in the leftmost columns, and the strongest reflections are in the rightmost colums. From column to column the cutoff rises by a factor of 2. The next line then reports the number of reflections of that PEAK and intensity.
    The idea is that e.g. if you see that strong reflections at PEAK=75 give bad R_meas values, but reflections of the same intensity (same column) give good R_meas values starting at PEAK=80, then you should/can raise MINPK to 80.

The first two of these tables may be visualized with loggraph.

Images

The following quantities mapped onto the detector surface:

  • misfits.pck: outliers identified in CORRECT. Useful e.g. to identify ice rings.
  • corr.pck: correlation with standard profiles
  • peaks.pck: completeness of profiles
  • rf.pck: R-factor (very interesting)
  • anom.pck: anomalous signal (very interesting)
  • scales.pck: intensity ratios between symmetry-related reflections, after scaling (very interesting)
  • nobs.pck: observed reflections (not very interesting)
  • rlps.pck: reciprocal lorentz factor (not very interesting)

These images are in the .pck format and may be visualized with the (obsolete) VIEW program; it should also be possible to use XDS-Viewer.

N.B.: Images are currently only produced for reflection files coming from XDS, not for those from XSCALE. The reason is that the latter lack a line like

!NX=  2048  NY=  2048    QX=  0.079090  QY=  0.079090

which is used to tell the program how big the detector is (only NX and NY are actually required). If you want image output for XSCALE reflection files, just copy this line into it.

Availability

Linux and Mac binaries are available. I (Kay dot Diederichs at uni-konstanz dot de) appreciate feedback. The source code is in Fortran90 and requires a Fortran90-compiled CCP4 library, so few people are currently in a position to compile and link the program.