2,684
edits
(copy scripts into wiki, and explain a bit) |
No edit summary |
||
Line 1: | Line 1: | ||
Graeme Winter measured 36 datasets of cubic insulin, 12 of which were from pig, cow and human, respectively. These are partial datasets, each consisting of 100 frames of 0.2° oscillation range. The goal of data processing is to process all of them with minimal effort and good results. In addition, the question is if the datasets can be correctly assigned to the organisms (or rather if they can be correctly assigned to 3 different groups), based only on the processed data. For the latter task, the result can be compared with the dataset names. | Graeme Winter measured 36 datasets of cubic insulin, 12 of which were from pig, cow and human, respectively. These are partial datasets, each consisting of 100 frames of 0.2° oscillation range. The goal of data processing is to process all of them with minimal effort and good results. In addition, the question is if the datasets can be correctly assigned to the organisms (or rather if they can be correctly assigned to 3 different groups), based only on the processed data. For the latter task, the result can be compared with the dataset names. | ||
Step 1: process a single dataset to get an idea of spacegroup, cell and resolution. This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by xscale_isocluster but it is easier to use a REFERENCE_DATA_SET if one exists. | === Step 1: process a single dataset to get an idea of spacegroup, cell and resolution. === | ||
This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by xscale_isocluster but it is easier to use a REFERENCE_DATA_SET if one exists. | |||
Step 2: create 36 directories, named according to the unique parts of the filenames. Needs a bit of bash scripting. | === Step 2: create 36 directories, named according to the unique parts of the filenames. === | ||
Needs a bit of bash scripting. | |||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 16: | Line 18: | ||
</pre> | </pre> | ||
Step 3: process the data by having generate_XDS.INP create XDS.INP for each of them. Then modify that XDS.INP and run xds_par. | === Step 3: process the data by having generate_XDS.INP create XDS.INP for each of them. Then modify that XDS.INP and run xds_par. === | ||
<pre> | <pre> | ||
#!/bin/bash | #!/bin/bash | ||
Line 46: | Line 48: | ||
Running this on a desktop Linux machine with 16 cores takes about 9 minutes. | Running this on a desktop Linux machine with 16 cores takes about 9 minutes. | ||
Step 4: scale and merge with xscale | === Step 4: scale and merge with xscale === | ||
<pre> | <pre> | ||
mkdir xscale | mkdir xscale | ||
Line 61: | Line 63: | ||
</pre> | </pre> | ||
Step 5: analyze resulting XSCALE.HKL to find 3 groups of datasets | === Step 5: analyze resulting XSCALE.HKL to find 3 groups of datasets === | ||
... representing pig, cow and human insulin, respectively (but of course it is not clear which group is which organism; one could look at the 1.2A electron density maps and compare with sequences). | |||
<pre> | <pre> | ||
xscale_isocluster XSCALE.HKL | xscale_isocluster XSCALE.HKL | ||
Line 104: | Line 107: | ||
END | END | ||
</pre> | </pre> | ||
This pseudo-PDB file can be visualized in coot or so and shows three groups, consisting of datasets 1-12, 13-24 and 25-36, respectively. | This pseudo-PDB file can be visualized in coot or so and shows three groups, consisting of datasets 1-12, 13-24 and 25-36, around coordinates (99,0,10), (98,10,-8) and (99,-13,-5), respectively. This sequential ordering agrees with the fact that the datasets were processed according to their names. In other words, the three groups found by [[xscale_isocluster]] correspond to the three different organisms, as expected. | ||
Thanks, Graeme! This is nice and shows the possibility to differentiate between crystals of different but similar content. |