Scale many datasets: Difference between revisions
(5 intermediate revisions by the same user not shown) | |||
Line 14: | Line 14: | ||
=== Step 2: process a single dataset to get an idea of spacegroup, cell and resolution. === | === Step 2: process a single dataset to get an idea of spacegroup, cell and resolution. === | ||
This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by xscale_isocluster but it is easier to use a REFERENCE_DATA_SET if one exists. | This was done for the (randomly chosen) X1 dataset. Turns out that it is cubic insulin, spacegroup I213 with cell about 78 78 78 90 90 90. The XDS_ASCII.HKL of that was saved as x1_as_reference.hkl to serve as REFERENCE_DATA_SET for all other datasets, to ensure consistent indexing, because otherwise the possibility of re-indexing would have to be considered. This can be done by [[xscale_isocluster]] but it is easier to use a REFERENCE_DATA_SET if one exists. | ||
=== Step 3: create 36 directories, named according to the unique parts of the filenames. === | === Step 3: create 36 directories, named according to the unique parts of the filenames. === | ||
Line 41: | Line 41: | ||
generate_XDS.INP "../../cows-pigs-people/${i}_1_00???.cbf.gz" >&generate_XDS.INP.log | generate_XDS.INP "../../cows-pigs-people/${i}_1_00???.cbf.gz" >&generate_XDS.INP.log | ||
# modifications of XDS.INP | # modifications of XDS.INP | ||
# make it read the cbf.gz files a little faster: | # make it read the cbf.gz files a little faster: ATTENTION - fill in the correct path!!! | ||
echo LIB=/usr/local/lib64/xds-zcbf.so >>XDS.INP | echo LIB=/usr/local/lib64/xds-zcbf.so >>XDS.INP | ||
# if commented in, runs only JOB=CORRECT: | # if commented in, runs only JOB=CORRECT: | ||
# sed -i 's/XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE//' XDS.INP | # sed -i -e 's/XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE//' XDS.INP | ||
# use all frames for COLSPOT instead of only the first half: | # use all frames for COLSPOT instead of only the first half: | ||
sed -i 's/SPOT_RANGE=1 50/SPOT_RANGE=1 100/' XDS.INP | sed -i -e 's/SPOT_RANGE=1 50/SPOT_RANGE=1 100/' XDS.INP | ||
# use high-resol cutoff of 1.2A according to some preliminary processing: | # use high-resol cutoff of 1.2A according to some preliminary processing: | ||
sed -i 's/RESOLUTION_RANGE=50 0/RESOLUTION_RANGE=50 1.2/' XDS.INP | sed -i -e 's/RESOLUTION_RANGE=50 0/RESOLUTION_RANGE=50 1.2/' XDS.INP | ||
# use a reference data set to get consistent indexing: | # use a reference data set to get consistent indexing: | ||
sed -i 's$! REFERENCE_DATA_SET=xxx/XDS_ASCII.HKL $ REFERENCE_DATA_SET= ../x1_as_reference.hkl $' XDS.INP | sed -i -e 's$! REFERENCE_DATA_SET=xxx/XDS_ASCII.HKL $ REFERENCE_DATA_SET= ../x1_as_reference.hkl $' XDS.INP | ||
# (note the use of the $ delimiter instead of / if the pattern has file paths) | # (note the use of the $ delimiter instead of / if the pattern has file paths) | ||
# if using a reference data set, spacegroup and cell constants must be given | # if using a reference data set, spacegroup and cell constants must be given | ||
sed -i 's/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER= 197/' XDS.INP | sed -i -e 's/SPACE_GROUP_NUMBER=0/SPACE_GROUP_NUMBER= 197/' XDS.INP | ||
sed -i 's/UNIT_CELL_CONSTANTS= 70 80 90 90 90 90/UNIT_CELL_CONSTANTS= 78 78 78 90 90 90/' XDS.INP | sed -i -e 's/UNIT_CELL_CONSTANTS= 70 80 90 90 90 90/UNIT_CELL_CONSTANTS= 78 78 78 90 90 90/' XDS.INP | ||
# run xds and write its terminal output to logfile | # run xds and write its terminal output to logfile | ||
xds_par >&xds.log | xds_par >&xds.log | ||
Line 60: | Line 60: | ||
done | done | ||
</pre> | </pre> | ||
Running this on my 2020 desktop Linux machine with 16 cores takes about 9 minutes. | Running this on my 2020 desktop Linux machine with 16 cores takes about 9 minutes. Surprising, takes just as long on my 2020 MacBook Air. | ||
=== Step 5: scale and merge with xscale === | === Step 5: scale and merge with xscale === | ||
Line 66: | Line 66: | ||
mkdir xscale | mkdir xscale | ||
cd xscale | cd xscale | ||
# create XSCALE.INP. | # create XSCALE.INP. Precise average cell parameters were obtained using cellparm (XDS package) from the 36 XDS_ASCII.HKL files | ||
echo UNIT_CELL_CONSTANTS=77.864 77.864 77.864 90 90 90 >XSCALE.INP | echo UNIT_CELL_CONSTANTS=77.864 77.864 77.864 90 90 90 >XSCALE.INP | ||
echo SPACE_GROUP_NUMBER=199 >>XSCALE.INP | echo SPACE_GROUP_NUMBER=199 >>XSCALE.INP | ||
Line 77: | Line 77: | ||
</pre> | </pre> | ||
=== Step 6: analyze resulting XSCALE.HKL to find 3 groups of datasets === | === Step 6: analyze, using [[xscale_isocluster]], the resulting XSCALE.HKL to find 3 groups of datasets === | ||
... representing pig, cow and human insulin, respectively (but of course it is not clear which group is which organism; one could look at the 1.2A electron density maps and compare with sequences). | ... representing pig, cow and human insulin, respectively (but of course it is not clear which group is which organism; one could look at the 1.2A electron density maps and compare with sequences). | ||
<pre> | <pre> | ||
xscale_isocluster XSCALE.HKL | xscale_isocluster XSCALE.HKL | ||
more iso.pdb | more iso.pdb | ||
# this is a pseudo-PDB file with coordinates x,y,z for each dataset: | |||
CRYST1 100.000 100.000 100.000 90.00 90.00 90.00 P 1 | CRYST1 100.000 100.000 100.000 90.00 90.00 90.00 P 1 | ||
HETATM 1 O HOH A 1 99.105 0.039 10.644 1.00100.00 | HETATM 1 O HOH A 1 99.105 0.039 10.644 1.00100.00 | ||
Line 122: | Line 123: | ||
</pre> | </pre> | ||
This pseudo-PDB file can be visualized in coot or so and shows three groups, consisting of datasets 1-12, 13-24 and 25-36, around coordinates (99,0,10), (98,10,-8) and (99,-13,-5), respectively. This sequential ordering agrees with the fact that the datasets were processed according to their names. In other words, the three groups found by [[xscale_isocluster]] correspond to the three different organisms, as expected. | This pseudo-PDB file can be visualized in coot or so and shows three groups, consisting of datasets 1-12, 13-24 and 25-36, around coordinates (99,0,10), (98,10,-8) and (99,-13,-5), respectively. This sequential ordering agrees with the fact that the datasets were processed according to their names. In other words, the three groups found by [[xscale_isocluster]] correspond to the three different organisms, as expected. | ||
An even better way is to run | |||
xscale_isocluster -clu 3 XSCALE.HKL | |||
and this will give you three output files XSCALE.1.INP XSCALE.2.INP XSCALE.3.INP each with the correct 12 datasets. | |||
Thanks, Graeme! This is nice and shows the possibility to differentiate between crystals of different but similar content. | Thanks, Graeme! This is nice and shows the possibility to differentiate between crystals of different but similar content. |