Cc analysis: Difference between revisions

Jump to navigation Jump to search
2,131 bytes added ,  5 July 2022
(6 intermediate revisions by 2 users not shown)
Line 15: Line 15:
# if all CCs are known, the solution is unique in terms of lengths of vectors, and angles between them. However, a rotated (around the origin) or inverted (through the origin) arrangement of the vectors leaves the functional unchanged, because these transformations do not change lengths and angles.  
# if all CCs are known, the solution is unique in terms of lengths of vectors, and angles between them. However, a rotated (around the origin) or inverted (through the origin) arrangement of the vectors leaves the functional unchanged, because these transformations do not change lengths and angles.  
# as long as the problem is over-determined, the vectors can be calculated. Unknown CCs between data sets (e.g. in case of crystallographic data sets that don't have common reflections) can be estimated from the dot product of their vectors. Over-determination means: each data set has to be related (directly or indirectly i.e through others) to any other by at least as many CCs as the desired number of dimensions is.
# as long as the problem is over-determined, the vectors can be calculated. Unknown CCs between data sets (e.g. in case of crystallographic data sets that don't have common reflections) can be estimated from the dot product of their vectors. Over-determination means: each data set has to be related (directly or indirectly i.e through others) to any other by at least as many CCs as the desired number of dimensions is.


== The program ==
== The program ==
<code>cc_analysis</code> calculates the vectors from the pairwise correlation coefficients. The (low) dimension must be specified, and a file with lines specifying the correlation coefficients must be provided.  
<code>cc_analysis</code> calculates the vectors from the pairwise correlation coefficients. The (low) dimension must be specified, and a file with lines specifying the correlation coefficients must be provided.  


  CC_ANALYSIS version 22.10.2018 (K. Diederichs). No redistribution please!
  CC_ANALYSIS version 30.12.2018 (K. Diederichs). No redistribution please!
  cc_analysis -dim <dim> [-b] [-w] [-z] <input.dat> <output.dat>
  cc_analysis -dim <dim> [-b] [-w] [-z] <input.dat> <output.dat>
  <input.dat> has lines with items: i j corr [ncorr]
  <input.dat> has lines with items: i j corr [ncorr]
Line 26: Line 25:
  -w option: calculate weights from  of correlated items (4th item on input line)
  -w option: calculate weights from  of correlated items (4th item on input line)
  -z option: use Fisher z-transformation
  -z option: use Fisher z-transformation
-f option: skip some calculations (fast)
-m <iters> option: use <iters> (default 20) least-squares iterations
-t <threads> option: use <threads> (default 8) threads
Notes:
Notes:
* the number of vectors must be > 2*(low dimension). Typical number of dimensions is 2 or 3, but depending on the problem it could of course be much more.
* the number of vectors must be > 2*(low dimension). Typical number of dimensions is 2 or 3, but depending on the problem it could of course be much more.


A Linux binary is available [ftp://strucbio.biologie.uni-konstanz.de/pub/cc_analysis].
Python code is available [https://strucbio.biologie.uni-konstanz.de/pub/cc_analysis.py] under GPL.


== Example ==
== Example ==
Line 91: Line 94:
     4  0.9760  0.0087  0.9760  0.0089
     4  0.9760  0.0087  0.9760  0.0089
     5  0.8626  0.1361  0.8733  0.1564
     5  0.8626  0.1361  0.8733  0.1564
</pre>
The output is: <vector #> <x> <y> <length> <angle> for each vector, in this 2-dimensional case; equivalently for higher dimensions.


The output is: <vector #> <x> <y> <length> <angle> for each vector.
The Python code produces the following output:
<pre>
bash-4.2$ python /usr/local/bin/cc_analysis.py 2 cc.dat
===
Correlation matrix parsed from infile:
[[  nan 0.017  0.0222 0.0233 0.0226]
[0.017    nan 0.7026 0.7287 0.6241]
[0.0222 0.7026    nan 0.9131 0.8049]
[0.0233 0.7287 0.9131    nan 0.8432]
[0.0226 0.6241 0.8049 0.8432    nan]]
===
Correction factor for 2nd and higher eigenvalue(s):
0.8000
===
Interpretation of correlation matrix as dot product matrix:
---
all h_i by iterative approach:
initial values:
[0.1459 0.7198 0.7815 0.7919 0.7574]
refinement by iteration:
#13: [0.0242 0.7414 0.9389 0.9798 0.8551]
===
Uncorrected eigenvalue(s):
2 used:
[3.1228 0.0126]
3 unused:
[ 0.0045 -0.0004 -0.0167]
---
Corrected eigenvalue(s):
2 used:
[3.1228 0.0158]
iter      RMS  max_chg  rms_chg
  0  0.00345        -        -
  1  0.00241 -0.04680  0.00403
  2  0.00127  0.01783  0.00275
  3  0.00057  0.01297  0.00162
  4  0.00029  0.00570  0.00073
  5  0.00023  0.00182  0.00026
  6  0.00023 -0.00047  0.00008
  7  0.00022 -0.00042  0.00004
  8  0.00022 -0.00042  0.00004
  9  0.00022 -0.00045  0.00005
  10  0.00022 -0.00045  0.00005
  11  0.00022 -0.00045  0.00005
  12  0.00021 -0.00044  0.00005
  13  0.00021 -0.00043  0.00005
  14  0.00021 -0.00043  0.00005
  15  0.00021 -0.00042  0.00004
  16  0.00021 -0.00042  0.00004
  17  0.00020 -0.00042  0.00004
  18  0.00020 -0.00042  0.00004
  19  0.00020 -0.00042  0.00004
  20  0.00020 -0.00042  0.00004
    1  0.0241 -0.0098
    2  0.7484  0.1425
    3  0.9359  0.0150
    4  0.9758 -0.0112
    5  0.8624 -0.1501
===
Finished outputting 2-dimensional representative vectors! =)
</pre>
The 5 lines at the bottom give the solution. The coordinates agree with those of the Fortran program within a rms deviation of 0.0055; however they are mirrored across the x axis and thus represent an inverted solution.
2,652

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu