Cc analysis: Difference between revisions

144 bytes removed ,  11 August 2023
m
m (→‎The program: fix link)
 
(2 intermediate revisions by the same user not shown)
Line 22: Line 22:
# as long as the problem is well-determined, the vectors can be calculated. Unknown CCs between data sets (e.g. in case of crystallographic data sets that don't have common reflections) can be estimated from the dot product of their vectors. Well-determination means: each data set has to be related (directly or indirectly i.e through others) to any other by at least as many CCs as the desired number of dimensions is. A necessary condition for this is that each data set has at least as many relations to others (input lines to cc_analysis involving this data set) as the number of dimensions is. It is of course better if more relations are specified!
# as long as the problem is well-determined, the vectors can be calculated. Unknown CCs between data sets (e.g. in case of crystallographic data sets that don't have common reflections) can be estimated from the dot product of their vectors. Well-determination means: each data set has to be related (directly or indirectly i.e through others) to any other by at least as many CCs as the desired number of dimensions is. A necessary condition for this is that each data set has at least as many relations to others (input lines to cc_analysis involving this data set) as the number of dimensions is. It is of course better if more relations are specified!


== The program ==
== The Fortran program ==
<code>cc_analysis</code> calculates the vectors from the pairwise correlation coefficients. The (low) dimension must be specified, and a file with lines specifying the correlation coefficients must be provided.  
<code>cc_analysis</code> calculates the vectors from the pairwise correlation coefficients. The (low) dimension must be specified, and a file with lines specifying the correlation coefficients must be provided.  


Line 29: Line 29:
  <input.dat> has lines with items: i j corr [ncorr]
  <input.dat> has lines with items: i j corr [ncorr]
  -b option: <input.dat> is a binary file (4 bytes for each item)
  -b option: <input.dat> is a binary file (4 bytes for each item)
  -w option: calculate weights from of correlated items (4th item on input line)
  -w option: calculate weights from number of correlated items (4th item on input line)
  -z option: use Fisher z-transformation
  -z option: use Fisher z-transformation
  -f option: skip some calculations (fast)
  -f option: skip some calculations (fast)
Line 55: Line 55:
     4      5  0.8432
     4      5  0.8432
</pre>
</pre>
Please note that the CCs are rounded to 4 valid digits. This introduces a bit of noise.
Please note that the CCs are rounded to 4 valid digits. This introduces a bit of noise. In addition, the solution in this example is not overdetermined; 5 is just the minimum number of objects that can be represented in 2-dimensional space when given 10 unique correlation coefficients.
<pre>
<pre>
bash-4.2$ cc_analysis -dim 2 cc.dat solution.dat
bash-4.2$ cc_analysis -dim 2 cc.dat solution.dat
  CC_ANALYSIS version 29.10.2018 (K. Diederichs). No redistribution please!
  CC_ANALYSIS version 29.10.2018 (K. Diederichs). No redistribution please!
compiler: Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.0.5.274 Build 20180823 options: -msse4.2 -L/usr/local/src/arpack/ -larpack_ifort -lmkl_lapack95_lp64 -lmkl_blas95_lp64 -mkl -static-intel -qopenmp-link=static -sox -traceback -qopenmp -align array64byte -assume buffered_io -o /home/dikay/bin/cc_analysis


Linux turn31.biologie.uni-konstanz.de 4.18.14-1.el7.elrepo.x86_64 #1 SMP Sat Oct 13 10:29:59 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Linux turn31.biologie.uni-konstanz.de 4.18.14-1.el7.elrepo.x86_64 #1 SMP Sat Oct 13 10:29:59 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Line 165: Line 163:
Finished outputting 2-dimensional representative vectors! =)
Finished outputting 2-dimensional representative vectors! =)
</pre>
</pre>
The 5 lines at the bottom give the solution. The coordinates agree with those of the Fortran program within a rms deviation of 0.0055; however they are mirrored across the x axis and thus represent an inverted solution.
The 5 lines at the bottom give the solution. The coordinates agree with those of the Fortran program within a rms deviation of 0.0055 after mirroring across the x axis (inverted solution) and rotating by half a degree.
2,653

edits