Scale many datasets: Difference between revisions

Jump to navigation Jump to search
Step 1 Zenodo link
m (explain sed command with $ delimiter)
(Step 1 Zenodo link)
Line 1: Line 1:
Graeme Winter measured 36 datasets of cubic insulin, 12 of which were from pig, cow and human, respectively. These are partial datasets, each consisting of 100 frames of 0.2° oscillation range. The goal of data processing is to process all of them with minimal effort and good results. In addition, the question is if the datasets can be correctly assigned to the organisms (or rather if they can be correctly assigned to 3 different groups), based only on the processed data. For the latter task, the result can be compared with the dataset names.
Graeme Winter measured 36 datasets of cubic insulin, 12 of which were from pig, cow and human, respectively. These are partial datasets, each consisting of 100 frames of 0.2° oscillation range. The goal of data processing is to process all of them with minimal effort and good results. In addition, the question is if the datasets can be correctly assigned to the organisms (or rather if they can be correctly assigned to 3 different groups), based only on the processed data. For the latter task, the result can be compared with the dataset names.


=== Step 1: process a single dataset to get an idea of spacegroup, cell and resolution. ===
=== Step 1: download the data from Zenodo ===
According to https://github.com/graeme-winter/dials_tutorials/blob/main/ccp4-aps-2024/COWS_PIGS_PEOPLE.md, the data (~6GB; https://zenodo.org/records/13890874) were taken on i24 at Diamond Light Source as part of routine commissioning work, with a number of small rotation data sets recorded from different crystals. Crystals were prepared of the protein insulin from cows, pigs and people (as described on the Zenodo deposition; bovine, porcine and human insulin, of course all grown in e-coli anyway).
<pre>
mkdir data
cd data
for set in CIX1_1 CIX2_1 CIX3_1 CIX5_1 CIX6_1 CIX8_1 CIX9_1 CIX10_1 CIX11_1 CIX12_1 CIX14_1 CIX15_1 PIX5_1 PIX6_1 PIX7_1 PIX8_1 PIX9_1 PIX10_1 PIX11_1 PIX12_1 PIX13_1 PIX14_1 PIX15_1 PIX16_1 X1_1 X2_1 X3_1 X4_1 X5_1 X6_1 X7_1 X8_1 X9_1 X11_1 X13_1 X14_1 ; do
wget https://zenodo.org/records/13890874/files/${set}.tar
tar xvf ${set}.tar
rm -v ${set}.tar
done
<pre>
 
=== Step 2: process a single dataset to get an idea of spacegroup, cell and resolution. ===
This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by xscale_isocluster but it is easier to use a REFERENCE_DATA_SET if one exists.
This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by xscale_isocluster but it is easier to use a REFERENCE_DATA_SET if one exists.


=== Step 2: create 36 directories, named according to the unique parts of the filenames. ===
=== Step 3: create 36 directories, named according to the unique parts of the filenames. ===
Needs a bit of bash scripting.
Needs a bit of bash scripting.
<pre>
<pre>
Line 19: Line 31:
</pre>
</pre>


=== Step 3: process the data by having generate_XDS.INP create XDS.INP for each of them. Then modify that XDS.INP and run xds_par. ===
=== Step 4: process the data by having generate_XDS.INP create XDS.INP for each of them. Then modify that XDS.INP and run xds_par. ===
<pre>
<pre>
#!/bin/bash
#!/bin/bash
Line 50: Line 62:
Running this on a desktop Linux machine with 16 cores takes about 9 minutes.
Running this on a desktop Linux machine with 16 cores takes about 9 minutes.


=== Step 4: scale and merge with xscale ===
=== Step 5: scale and merge with xscale ===
<pre>
<pre>
mkdir xscale
mkdir xscale
Line 65: Line 77:
</pre>
</pre>


=== Step 5: analyze resulting XSCALE.HKL to find 3 groups of datasets ===
=== Step 6: analyze resulting XSCALE.HKL to find 3 groups of datasets ===
... representing pig, cow and human insulin, respectively (but of course it is not clear which group is which organism; one could look at the 1.2A electron density maps and compare with sequences).
... representing pig, cow and human insulin, respectively (but of course it is not clear which group is which organism; one could look at the 1.2A electron density maps and compare with sequences).
<pre>
<pre>
2,684

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu