Cc analysis: Difference between revisions

57 bytes removed ,  7 December 2022
Improve description
(show and define formula which is minimized)
(Improve description)
Line 9: Line 9:
as a function of the vector <math>\bf{x}</math>, the column vector of the N low-dimensional vectors <math>\it{\{{x_{k}\}}}</math>. This can be performed by minimizing from random starting positions, or more elegantly and efficiently by obtaining starting positions through Eigen decomposition after estimating the missing values of the matrix of correlation coefficients.
as a function of the vector <math>\bf{x}</math>, the column vector of the N low-dimensional vectors <math>\it{\{{x_{k}\}}}</math>. This can be performed by minimizing from random starting positions, or more elegantly and efficiently by obtaining starting positions through Eigen decomposition after estimating the missing values of the matrix of correlation coefficients.


As the resulting vectors x are in low-dimensional space, and the data sets reside in high-dimensional space, the procedure may be considered as ''multidimensional scaling'' - there are other procedures in multidimensional scaling, but this particular one has first been described in [http://journals.iucr.org/d/issues/2017/04/00/rr5141/index.html Diederichs, Acta D (2017)]. Alternatively, we can think of the procedure as ''unsupervised learning'', because it "learns" from the given CCs, and predicts the unknown CCs - or rather, the relations of even those data sets that have nothing (crystallography: no reflections; imaging: no pixels) in common.
As the resulting vectors <math>\it{\{{x_{k}\}}}</math> are in low-dimensional space, and the data sets reside in high-dimensional space, the procedure may be considered as ''multidimensional scaling''. This procedure has first been described in [http://journals.iucr.org/d/issues/2017/04/00/rr5141/index.html Diederichs, Acta D (2017)]. We can also think of the procedure as ''unsupervised learning'', because it "learns" from the given CCs, and predicts the unknown CCs - or rather, the relations of even those data sets that have nothing (crystallography: no reflections; imaging: no pixels) in common.


== Properties of cc_analysis ==
== Properties of cc_analysis ==
Line 15: Line 15:
It turns out that the dot product (also called scalar product, or inner product) of low-dimensional vectors (representing the high-dimensional data sets) is very appropriate for approximating the CCs. This is because a CC is itself a dot product (eqn. 3 of [http://journals.iucr.org/d/issues/2017/04/00/rr5141/index.html Diederichs, Acta D (2017)).
It turns out that the dot product (also called scalar product, or inner product) of low-dimensional vectors (representing the high-dimensional data sets) is very appropriate for approximating the CCs. This is because a CC is itself a dot product (eqn. 3 of [http://journals.iucr.org/d/issues/2017/04/00/rr5141/index.html Diederichs, Acta D (2017)).


It further turns out that if the dot product is used, then the resulting arrangement of vectors (that minimize a functional defined in eqn. 1) has the properties  
It further turns out that the resulting arrangement of vectors (that minimize the function <math>\Phi(\mathbf{x})</math> given above) has the properties  
# 0 <= length <= 1 for each vector; short vectors have a low signal-to-noise ratio, long vectors a good signal-to-noise ratio, and vectors of length 1 represent ideal (prototypical) data sets. In fact, the length of each vector is CC* (as defined in [http://dx.doi.org/10.1126/science.1218231 Karplus & Diederichs (2012)]), and its exact relation to the signal-to-noise ratio is given in eqn. 4.
# 0 <= length <= 1 for each vector; short vectors have a low signal-to-noise ratio, long vectors a good signal-to-noise ratio, and vectors of length 1 represent ideal (prototypical) data sets. In fact, the length of each vector is CC* (as defined in [http://dx.doi.org/10.1126/science.1218231 Karplus & Diederichs (2012)]), and its exact relation to the signal-to-noise ratio is given in eqn. 4.
# vectors point in the same direction (i.e. they lie on a radial line) if their data sets only differ by random noise
# vectors point in the same direction (i.e. they lie on a radial line) if their data sets only differ by random noise
2,652

edits