Eiger: Difference between revisions
Docandreas (talk | contribs) No edit summary |
|||
(31 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
Processing of [https://www.dectris.com/EIGER_X_Features.html Eiger] data is different from processing of conventional data, because the frames are wrapped into [http://www.hdfgroup.org HDF5] files (ending with .h5). However, with the [https://github.com/dectris/neggia | Processing of [https://www.dectris.com/EIGER_X_Features.html Eiger] data is different from processing of conventional data, because the frames are wrapped into [http://www.hdfgroup.org HDF5] files (often ending with .h5). However, with the [[LIB]] feature of XDS and a suitable plugin (preferably [https://github.com/dectris/neggia ''Neggia''], or [https://github.com/DiamondLightSource/durin ''Durin''] for data collected at Diamond Light Source), processing is efficient. | ||
== General aspects == | == General aspects == | ||
# The framecache of XDS uses memory to save on I/O; it saves a frame in RAM after reading it for the first time. By default, each XDS (or mcolspot/mintegrate) job stores NUMBER_OF_IMAGES_IN_CACHE=DELPHI/OSCILLATION_RANGE images in memory which corresponds to one DELPHI-sized batch of data. This requires (number of pixels)*(number of jobs)*4 Bytes per frame which amounts to 72 MB in case of the Eiger 16M when running with MAXIMUM_NUBER_OF_JOBS=1. (If DELPHI=20 and OSCILLATION_RANGE=0.05 your computer thus has to have at least 400*72MB = 29GB of memory for each job). If | # The framecache of XDS uses memory to save on I/O; it saves a frame in RAM after reading it for the first time. By default, each XDS (or mcolspot/mintegrate) job stores NUMBER_OF_IMAGES_IN_CACHE=DELPHI/OSCILLATION_RANGE+1 images in memory which corresponds to one DELPHI-sized batch of data. This requires (number of pixels)*(number of jobs)*4 Bytes per frame which amounts to 72 MB in case of the Eiger 16M when running with MAXIMUM_NUBER_OF_JOBS=1. (If DELPHI=20 and OSCILLATION_RANGE=0.05 your computer thus has to have at least 400*72MB = 29GB of memory for each job!). If memory allocation fails, the fallback is to the old behaviour of reading each frame three times (instead of once). | ||
# Dectris provides | # Apart from the framecache, XDS needs (number of jobs)*(number of processes)*NX*NY*4 Bytes, plus about one GB for the code. | ||
# Dectris provides the ''Neggia'' library ([https://github.com/dectris/neggia source],[https://www.dectris.com/support/downloads/sign-in binary]) for native reading of HDF5 files, which can be loaded into XDS at runtime using the <code>[[LIB]]=</code> [http://xds.mpimf-heidelberg.mpg.de/html_doc/xds_parameters.html#LIB= keyword]. With this library (which can also be found at https://{{SERVERNAME}}/pub/linux_bin for Linux, and at https://{{SERVERNAME}}/pub/mac_bin for MacOS), no conversion to CBF or otherwise is necessary. It is therefore just as fast and efficient to read HDF5 files as any other file format. At Diamond Light Source, a different HDF5 format was developed, and this requires the [https://github.com/DiamondLightSource/durin/releases/latest ''Durin'' plugin]. The latter can also read the HDF5 files written by the Dectris software, but frames are not read in parallel, so it is slower. | |||
A suitable [[XDS.INP]] may have been written by the data collection (beamline) software. Latest [[generate_XDS.INP]] (<code>generate_XDS.INP xxx_master.h5</code>) or the [[Eiger# | A suitable [[XDS.INP]] may have been written by the data collection (beamline) software. Latest [[generate_XDS.INP]] (<code>generate_XDS.INP xxx_master.h5</code>) or the [[Eiger#Script_for_generating_XDS.INP_from_master.h5|XDS_from_H5.py script]] can be used if XDS.INP is not available. | ||
== Compression == | == Compression == | ||
Line 15: | Line 15: | ||
Update 2016-06-05 (Toine Schreurs): a HDF5 file may be compressed with [https://www.hdfgroup.org/HDF5/docNewFeatures/FileSpace/h5repack.htm h5repack], ''e.g.'' by <code>h5repack -i <in.h5> -o <out.h5> -f GZIP=6</code> (6 is the default compression level of gzip). This should be a good way to reduce the size of master files while keeping them compatible with processing, but needs to be tested. Whether h5repack uses parallel gzip is not clear from the docs. | Update 2016-06-05 (Toine Schreurs): a HDF5 file may be compressed with [https://www.hdfgroup.org/HDF5/docNewFeatures/FileSpace/h5repack.htm h5repack], ''e.g.'' by <code>h5repack -i <in.h5> -o <out.h5> -f GZIP=6</code> (6 is the default compression level of gzip). This should be a good way to reduce the size of master files while keeping them compatible with processing, but needs to be tested. Whether h5repack uses parallel gzip is not clear from the docs. | ||
== Troubleshooting == | == Troubleshooting == | ||
* make sure that master.h5 and the corresponding data.h5 files remain together as collected, and '''don't rename the data.h5 files''' - they are referred to from master.h5. If you change the names of the data.h5 files or copy them somewhere else, that link is broken unless you fix master.h5. | * make sure that master.h5 and the corresponding data.h5 files remain together as collected, and '''don't rename the data.h5 files''' - they are referred to from master.h5. If you change the names of the data.h5 files or copy them somewhere else, that link is broken unless you fix master.h5. | ||
= | = Less efficient way of processing Eiger data, using conversion to CBF = | ||
Since the release of Neggia, a plugin for XDS that parallelizes the reading of images from HDF5 data, conversion by H5ToXds should no longer be required in most usage scenarios. The sections below nevertheless describe this possibility, since preliminary experience with some less common network file systems (apparently GPFS, but not NFS) seems to indicate low performance of Neggia. | |||
Conversion program options: Dectris provides [https://www.dectris.com/news.html?page=2 H5ToXds] (Linux only!). That program converts (as the name indicates) the HDF5 files to CBF files; however, it does not write the geometry and other information into the CBF header (therefore, [[generate_XDS.INP]] or MOSFLM does not work with these files). Alternatives are GlobalPhasing's hdf2mini-cbf program (does ''not'' need autoPROC license) or, from http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver721/downloads, the eiger2cbf-osx or eiger2cbf-linux program written by T. Nakane. The latter programs do write a useful CBF header. | |||
H5ToXds and eiger2cbf-osx / eiger2cbf-linux do not work with files produced at Diamond Light Source. | |||
== A script for faster XDS processing of CBF-converted Eiger data (this is only shown out of historic interest) == | |||
For faster processing, the [[Eiger#A_script_for_faster_XDS_processing_of_CBF-converted Eiger data|shell script]] below should be copied to /usr/local/bin/H5ToXds and made executable (<code>chmod a+rx /usr/local/bin/H5ToXds*</code>). The binary H5ToXds then should be named e.g. /usr/local/bin/H5ToXds.bin - note the .bin filename extension! The script ''also'' uses RAM to speed up processing; it uses it for fast storage of the temporary CBF file that H5ToXds/eiger2cbf/hdf2mini-cbf writes, and that each parallel thread ("processor") of XDS reads. The amount of additional RAM this requires is modest (about (number of pixels)*(number of threads) bytes). | For faster processing, the [[Eiger#A_script_for_faster_XDS_processing_of_CBF-converted Eiger data|shell script]] below should be copied to /usr/local/bin/H5ToXds and made executable (<code>chmod a+rx /usr/local/bin/H5ToXds*</code>). The binary H5ToXds then should be named e.g. /usr/local/bin/H5ToXds.bin - note the .bin filename extension! The script ''also'' uses RAM to speed up processing; it uses it for fast storage of the temporary CBF file that H5ToXds/eiger2cbf/hdf2mini-cbf writes, and that each parallel thread ("processor") of XDS reads. The amount of additional RAM this requires is modest (about (number of pixels)*(number of threads) bytes). | ||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 146: | Line 68: | ||
[[Performance]] | [[Performance]] | ||
[https://github.com/keitaroyam/yamtbx/blob/master/doc/eiger-en.md Keitaro Yamashita's Eiger page, with some emphasis on SPring-8] |
Latest revision as of 22:07, 16 August 2022
Processing of Eiger data is different from processing of conventional data, because the frames are wrapped into HDF5 files (often ending with .h5). However, with the LIB feature of XDS and a suitable plugin (preferably Neggia, or Durin for data collected at Diamond Light Source), processing is efficient.
General aspects
- The framecache of XDS uses memory to save on I/O; it saves a frame in RAM after reading it for the first time. By default, each XDS (or mcolspot/mintegrate) job stores NUMBER_OF_IMAGES_IN_CACHE=DELPHI/OSCILLATION_RANGE+1 images in memory which corresponds to one DELPHI-sized batch of data. This requires (number of pixels)*(number of jobs)*4 Bytes per frame which amounts to 72 MB in case of the Eiger 16M when running with MAXIMUM_NUBER_OF_JOBS=1. (If DELPHI=20 and OSCILLATION_RANGE=0.05 your computer thus has to have at least 400*72MB = 29GB of memory for each job!). If memory allocation fails, the fallback is to the old behaviour of reading each frame three times (instead of once).
- Apart from the framecache, XDS needs (number of jobs)*(number of processes)*NX*NY*4 Bytes, plus about one GB for the code.
- Dectris provides the Neggia library (source,binary) for native reading of HDF5 files, which can be loaded into XDS at runtime using the
LIB=
keyword. With this library (which can also be found at https://wiki.uni-konstanz.de/pub/linux_bin for Linux, and at https://wiki.uni-konstanz.de/pub/mac_bin for MacOS), no conversion to CBF or otherwise is necessary. It is therefore just as fast and efficient to read HDF5 files as any other file format. At Diamond Light Source, a different HDF5 format was developed, and this requires the Durin plugin. The latter can also read the HDF5 files written by the Dectris software, but frames are not read in parallel, so it is slower.
A suitable XDS.INP may have been written by the data collection (beamline) software. Latest generate_XDS.INP (generate_XDS.INP xxx_master.h5
) or the XDS_from_H5.py script can be used if XDS.INP is not available.
Compression
The number of pixels of the Eiger 16M is three times higher than that of the Pilatus 6M, but since the Eiger firmware update in November 2015, the ("bit shufflle LZ4") compression of the .h5 files containing data is better than that of CBF files, which mostly compensates for the increased number of pixels.
The size of the *master.h5 file from a Eiger 16M experiment at SLS X06SA is more than 300MB, no matter how many frames are collected. It is therefore advisable to compress (by ~75%) the *master.h5 files on-site, before transferring them home using disk or internet. A very fast (parallel) program is lbzip2 (available from the EPEL repository for RHEL clones). It is supposedly fully compatible with bzip2.
Update 2016-06-05 (Toine Schreurs): a HDF5 file may be compressed with h5repack, e.g. by h5repack -i <in.h5> -o <out.h5> -f GZIP=6
(6 is the default compression level of gzip). This should be a good way to reduce the size of master files while keeping them compatible with processing, but needs to be tested. Whether h5repack uses parallel gzip is not clear from the docs.
Troubleshooting
- make sure that master.h5 and the corresponding data.h5 files remain together as collected, and don't rename the data.h5 files - they are referred to from master.h5. If you change the names of the data.h5 files or copy them somewhere else, that link is broken unless you fix master.h5.
Less efficient way of processing Eiger data, using conversion to CBF
Since the release of Neggia, a plugin for XDS that parallelizes the reading of images from HDF5 data, conversion by H5ToXds should no longer be required in most usage scenarios. The sections below nevertheless describe this possibility, since preliminary experience with some less common network file systems (apparently GPFS, but not NFS) seems to indicate low performance of Neggia.
Conversion program options: Dectris provides H5ToXds (Linux only!). That program converts (as the name indicates) the HDF5 files to CBF files; however, it does not write the geometry and other information into the CBF header (therefore, generate_XDS.INP or MOSFLM does not work with these files). Alternatives are GlobalPhasing's hdf2mini-cbf program (does not need autoPROC license) or, from http://www.mrc-lmb.cam.ac.uk/harry/imosflm/ver721/downloads, the eiger2cbf-osx or eiger2cbf-linux program written by T. Nakane. The latter programs do write a useful CBF header.
H5ToXds and eiger2cbf-osx / eiger2cbf-linux do not work with files produced at Diamond Light Source.
A script for faster XDS processing of CBF-converted Eiger data (this is only shown out of historic interest)
For faster processing, the shell script below should be copied to /usr/local/bin/H5ToXds and made executable (chmod a+rx /usr/local/bin/H5ToXds*
). The binary H5ToXds then should be named e.g. /usr/local/bin/H5ToXds.bin - note the .bin filename extension! The script also uses RAM to speed up processing; it uses it for fast storage of the temporary CBF file that H5ToXds/eiger2cbf/hdf2mini-cbf writes, and that each parallel thread ("processor") of XDS reads. The amount of additional RAM this requires is modest (about (number of pixels)*(number of threads) bytes).
#!/bin/bash # Kay Diederichs 10/2015 # 3/2017 include RAMdisk creation for MacOS; only lightly tested! # 3/2016 adapt for eiger2cbf and hdf2mini-cbf # for the latter see https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccp4bb;58a4ee1.1603 and # https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ccp4bb;a048b4e8.1603 # # Idea: put temporary files into fast local directory, instead of NFS # # Installation: Rename Dectris' H5ToXds to H5ToXds.bin # This script should be called H5ToXds and reside in $PATH # Modify this script according to which binary you use - see comments below. # # Recommendation: # - for the fast local directory one should use a RAMdisk (one GB size at most) # - /dev/shm seems to be already set up for that purpose on most Linux distributions # - on MacOS you can easily set this up as described at http://stackoverflow.com/questions/2033362/does-os-x-have-an-equivalent-to-dev-shm # example on MacOS for 1GB RAMdisk (needs to be repeated after booting): # diskutil eraseVolume HFS+ RAMdisk $(hdiutil attach -nomount ram://$((2 * 1024 * 1000))) # # on MacOS the next line should then be: # tempfile="/Volumes/RAMdisk/H5ToXds${PWD//\//_}.$3" # and on Linux: tempfile="/dev/shm/H5ToXds${PWD//\//_}.$3" # # choose between H5ToXds.bin, eiger2cbf and hdf2mini-cbf; un/comment accordingly /usr/local/bin/H5ToXds.bin $1 $2 "$tempfile" || rm "$tempfile" #/usr/local/bin/eiger2cbf-linux $1 $2 "$tempfile" >& /dev/null || rm "$tempfile" #/usr/local/bin/eiger2cbf-osx $1 $2 "$tempfile" >& /dev/null || rm "$tempfile" #/usr/local/bin/hdf2mini-cbf $1 $2 "$tempfile" || rm "$tempfile" ln -sf "$tempfile" $3 2>/dev/null
See also
Keitaro Yamashita's Eiger page, with some emphasis on SPring-8