Quality Control: Difference between revisions
No edit summary |
(→ACA2014: add link to M.J. Whitley's tutorial) |
||
(16 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
This is an attempt of putting together a number of datasets with different characteristics (high and low resolution, good and bad crystals, untwinned and twinned), their evaluation with different versions of [[XDS]] up to the structure solution (as far as that can be done automatically), and the determination of the quality of the resulting data using experimental phasing and refinement. | This is an attempt of putting together a number of datasets with different characteristics (high and low resolution, good and bad crystals, untwinned and twinned), their evaluation with different versions of [[XDS]] up to the structure solution (as far as that can be done automatically), and the determination of the quality of the resulting data using experimental phasing and refinement. | ||
== Assorted projects == | |||
For each project mentioned below, both '''the raw data ''and'' the [[XDS]] data reduction is available''' - links are on the project pages, which are named according to their [http://www.rcsb.org PDB] ids. | For each project mentioned below, both '''the raw data ''and'' the [[XDS]] data reduction is available''' - links are on the project pages, which are named according to their [http://www.rcsb.org PDB] ids. | ||
Line 13: | Line 8: | ||
There's currently data available for | There's currently data available for | ||
* PDB id [[1T92]] ([http://www.rcsb.org/pdb/explore.do?structureId=1T92 PulG], 2 | * 2-wl MAD: PDB id [[1T92]] ([http://www.rcsb.org/pdb/explore.do?structureId=1T92 PulG], 116 residues, 2 copies/ASU, spacegroup P6<sub>5</sub>22, resolution 2.8 Å) | ||
* PDB id [[1ZTV]] ([http://www.rcsb.org/pdb/explore.do?structureId=1ZTV JCSG target name TB1631F] | * 3-wl MAD: PDB id [[1ZTV]] ([http://www.rcsb.org/pdb/explore.do?structureId=1ZTV JCSG target name TB1631F], resolution 3.1 Å) - evaluated with Qingping Xu; it's one of the [http://www.jcsg.org JCSG] datasets. | ||
* PDB id [[2GIF]] ([http://www.rcsb.org/pdb/explore.do?structureId=2GIF AcrB], spacegroup C2, resolution 2.9 Å - a large membrane protein structure) | * native data: PDB id [[2GIF]] ([http://www.rcsb.org/pdb/explore.do?structureId=2GIF AcrB], spacegroup C2, resolution 2.9 Å - a large membrane protein structure) | ||
* PDB id [[1YCE]] ([http://www.rcsb.org/pdb/explore.do?structureId=1YCE ATPase C-ring], spacegroup P2<sub>1</sub>, resolution 2.4 Å - a large membrane protein structure, 44-fold NCS, strong diffuse scattering) | * native data for MR: PDB id [[1YCE]] ([http://www.rcsb.org/pdb/explore.do?structureId=1YCE ATPase C-ring], spacegroup P2<sub>1</sub>, resolution 2.4 Å - a large membrane protein structure, 44-fold NCS, strong diffuse scattering) | ||
* PDB id [[1RQW]] ([http://www.rcsb.org/pdb/explore.do?structureId=1RQW Thaumatin], a sweet-tasting protein of 207 resides, spacegroup P4<sub>1</sub>2<sub>1</sub>2, resolution 1.8 Å, collected at the DGK Workshop 2007; | * 2-wl Bromide MAD: PDB id [[1RQW]] ([http://www.rcsb.org/pdb/explore.do?structureId=1RQW Thaumatin], a sweet-tasting protein of 207 resides, spacegroup P4<sub>1</sub>2<sub>1</sub>2, resolution 1.8 Å, collected at the DGK Workshop 2007; could also treat either peak or inflection as SAD. These datasets have been made available by Manfred S. Weiss and Annette Faust (see [http://www.embl-hamburg.de/Xray_Tutorial/ Xray Tutorial]). | ||
== ACA2011 == | |||
As part of the [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTP-website/index.html 2011 ACA workshop on data processing], organized by Ed Collins and Andy Torelli, I extensively document XDS data processing (and in most cases also structure solution) of | |||
* [[2VB1]]: hen egg-white lysozyme @ 0.65Å resolution, PDB id [http://www.rcsb.org/pdb/explore/explore.do?structureId=2VB1 2vb1]. Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from [http://bl831.als.lbl.gov/example_data_sets/APS/19-ID/2vb1/ here]. Details of data collection, processing and refinement are [http://journals.iucr.org/d/issues/2007/12/00/be5097/index.html published]. | |||
* [[simulated-1g1c]]: James Holton's simulated (using his MLFSOM program) data of PDB id [http://www.rcsb.org/pdb/explore/explore.do?structureId=1G1C 1g1c] - 100 synthetic datasets (15 frames each) strongly affected by radiation damage. These 100 datasets were simulated in random orientations and with random (?) "crystal" sizes. The processing and scaling can be considered non-standard, and challenging. | |||
* [[2QVO]]: S-SAD: PDB id [http://www.rcsb.org/pdb/explore.do?structureId=2QVO AF1382], a 95-residue protein used by James Tucker Swindell II to establish optimized procedures for data reduction. The data available to solve the structure are two runs of 360° collected at 1.9Å wavelength. Datasets are at [https://{{SERVERNAME}}/pub/xds-datared/2qvo/]. | |||
* [[3CSL]]: 3-wl SeMet-MAD data PDB id [http://www.pdb.org/pdb/explore/explore.do?structureId=3CSL 3CSL], a complex of a 22 stranded beta-barrel outer membrane protein (the ordered residues 112-865 harbour 9 SeMet), its 173-residue hemophore (1 SeMet), and heme. Datasets are at [https://{{SERVERNAME}}/pub/xds-datared/3csl]. 2 complexes per ASU, useful data to 3.2Å, useful anomalous data to about 5Å. Challenging for humans, and too difficult for automatic methods of structure solution and model building. | |||
* [[1Y13]]: SAD with a twist that requires some detective work. | |||
== [[ACA2014]] == | |||
Data from several projects were processed by experts at the "Data processing with the pros" session of the [[ACA2014]] in Albuquerque, New Mexico (USA) at the end of May 2014. [http://bl831.als.lbl.gov/example_data_sets/ACA2011/DPWTPreloaded/index.html This website] has detailed descriptions and links. | |||
== CSHL2018 == | |||
[https://{{SERVERNAME}}/pub/MWhitley_CSHL-2018_XDS-Tutorial.pdf Matthew J. Whitley's excellent tutorial] about XDS processing with [[XDSGUI]], from the 2018 Cold Spring Harbor X-Ray Methods in Structural Biology Course. This has links to data sets. | |||
== Simulated data == | |||
I wrote [[SIM_MX]] which makes testing and development of XDS independant from real data. |
Latest revision as of 09:57, 2 May 2020
This is an attempt of putting together a number of datasets with different characteristics (high and low resolution, good and bad crystals, untwinned and twinned), their evaluation with different versions of XDS up to the structure solution (as far as that can be done automatically), and the determination of the quality of the resulting data using experimental phasing and refinement.
Assorted projects
For each project mentioned below, both the raw data and the XDS data reduction is available - links are on the project pages, which are named according to their PDB ids.
The raw data (images) for the different datasets are either available from this site by FTP, or from the (publicly accessible!) JCSG dataset archive, or - for 1RQW - from [1] or [2].
There's currently data available for
- 2-wl MAD: PDB id 1T92 (PulG, 116 residues, 2 copies/ASU, spacegroup P6522, resolution 2.8 Å)
- 3-wl MAD: PDB id 1ZTV (JCSG target name TB1631F, resolution 3.1 Å) - evaluated with Qingping Xu; it's one of the JCSG datasets.
- native data: PDB id 2GIF (AcrB, spacegroup C2, resolution 2.9 Å - a large membrane protein structure)
- native data for MR: PDB id 1YCE (ATPase C-ring, spacegroup P21, resolution 2.4 Å - a large membrane protein structure, 44-fold NCS, strong diffuse scattering)
- 2-wl Bromide MAD: PDB id 1RQW (Thaumatin, a sweet-tasting protein of 207 resides, spacegroup P41212, resolution 1.8 Å, collected at the DGK Workshop 2007; could also treat either peak or inflection as SAD. These datasets have been made available by Manfred S. Weiss and Annette Faust (see Xray Tutorial).
ACA2011
As part of the 2011 ACA workshop on data processing, organized by Ed Collins and Andy Torelli, I extensively document XDS data processing (and in most cases also structure solution) of
- 2VB1: hen egg-white lysozyme @ 0.65Å resolution, PDB id 2vb1. Data (sweeps a to h, each comprising 60 to 360 frames of 72MB) were collected by Zbigniew Dauter at APS 19-ID and are available from here. Details of data collection, processing and refinement are published.
- simulated-1g1c: James Holton's simulated (using his MLFSOM program) data of PDB id 1g1c - 100 synthetic datasets (15 frames each) strongly affected by radiation damage. These 100 datasets were simulated in random orientations and with random (?) "crystal" sizes. The processing and scaling can be considered non-standard, and challenging.
- 2QVO: S-SAD: PDB id AF1382, a 95-residue protein used by James Tucker Swindell II to establish optimized procedures for data reduction. The data available to solve the structure are two runs of 360° collected at 1.9Å wavelength. Datasets are at [3].
- 3CSL: 3-wl SeMet-MAD data PDB id 3CSL, a complex of a 22 stranded beta-barrel outer membrane protein (the ordered residues 112-865 harbour 9 SeMet), its 173-residue hemophore (1 SeMet), and heme. Datasets are at [4]. 2 complexes per ASU, useful data to 3.2Å, useful anomalous data to about 5Å. Challenging for humans, and too difficult for automatic methods of structure solution and model building.
- 1Y13: SAD with a twist that requires some detective work.
ACA2014
Data from several projects were processed by experts at the "Data processing with the pros" session of the ACA2014 in Albuquerque, New Mexico (USA) at the end of May 2014. This website has detailed descriptions and links.
CSHL2018
Matthew J. Whitley's excellent tutorial about XDS processing with XDSGUI, from the 2018 Cold Spring Harbor X-Ray Methods in Structural Biology Course. This has links to data sets.
Simulated data
I wrote SIM_MX which makes testing and development of XDS independant from real data.