Revisions to Understanding distance correlation computations

replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

1

Compute matrix of euclidean distances between N cases by variable $X$, and another likewise matrix by variable $Y$. Any of the two quantitative features, $X$ or $Y$, might be multivariate, not just univariate.
Perform double centering of each matrix. See how double centering how double centering is usually done. However, in our case, when doing it do not square the distances initially and don't divide by $-2$ in the end. Row, column means and overall mean of the elements become zero.
Multiply the two resultant matrices elementwise and compute the sum; or equivalently, unwrap the matrices into two column vectors and compute their summed cross-product.
Average, dividing by the number of elements, N^2.
Take square root. The result is the distance covariance between $X$ and $Y$.
Distance variances are the distance covariances of $X$, of $Y$ with own selves, you compute them likewise, points 3-4-5.
Distance correlation is obtained from the three numbers analogously how Pearson correlation is obtained from usual covariance and the pair of variances: divide the covariance by the sq. root of the product of two variances.

In euclidean space, a scalar product is the similarity univocally tied univocally tied with the corresponding distance. If you have two points (vectors) you may express their closeness as scalar product instead of their distance without losing information.

Now, the usual double centering usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Additional taking of square root after (point 5) seems logical because in our case the moment was already itself a sort of covariance (a scalar product and a covariance are compeers are compeers structurally) and so it came that you a kind of multiplyed covariances twice. Therefore in order to descend back on the level of the values of the original data (and to be able to compute correlation value) one has to take the root afterwards.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted attempted, unsuccessfully).

Compute matrix of euclidean distances between N cases by variable $X$, and another likewise matrix by variable $Y$. Any of the two quantitative features, $X$ or $Y$, might be multivariate, not just univariate.
Perform double centering of each matrix. See how double centering is usually done. However, in our case, when doing it do not square the distances initially and don't divide by $-2$ in the end. Row, column means and overall mean of the elements become zero.
Multiply the two resultant matrices elementwise and compute the sum; or equivalently, unwrap the matrices into two column vectors and compute their summed cross-product.
Average, dividing by the number of elements, N^2.
Take square root. The result is the distance covariance between $X$ and $Y$.
Distance variances are the distance covariances of $X$, of $Y$ with own selves, you compute them likewise, points 3-4-5.
Distance correlation is obtained from the three numbers analogously how Pearson correlation is obtained from usual covariance and the pair of variances: divide the covariance by the sq. root of the product of two variances.

In euclidean space, a scalar product is the similarity univocally tied with the corresponding distance. If you have two points (vectors) you may express their closeness as scalar product instead of their distance without losing information.

Now, the usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Additional taking of square root after (point 5) seems logical because in our case the moment was already itself a sort of covariance (a scalar product and a covariance are compeers structurally) and so it came that you a kind of multiplyed covariances twice. Therefore in order to descend back on the level of the values of the original data (and to be able to compute correlation value) one has to take the root afterwards.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).

Compute matrix of euclidean distances between N cases by variable $X$, and another likewise matrix by variable $Y$. Any of the two quantitative features, $X$ or $Y$, might be multivariate, not just univariate.
Perform double centering of each matrix. See how double centering is usually done. However, in our case, when doing it do not square the distances initially and don't divide by $-2$ in the end. Row, column means and overall mean of the elements become zero.
Multiply the two resultant matrices elementwise and compute the sum; or equivalently, unwrap the matrices into two column vectors and compute their summed cross-product.
Average, dividing by the number of elements, N^2.
Take square root. The result is the distance covariance between $X$ and $Y$.
Distance variances are the distance covariances of $X$, of $Y$ with own selves, you compute them likewise, points 3-4-5.
Distance correlation is obtained from the three numbers analogously how Pearson correlation is obtained from usual covariance and the pair of variances: divide the covariance by the sq. root of the product of two variances.

In euclidean space, a scalar product is the similarity univocally tied with the corresponding distance. If you have two points (vectors) you may express their closeness as scalar product instead of their distance without losing information.

Now, the usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Additional taking of square root after (point 5) seems logical because in our case the moment was already itself a sort of covariance (a scalar product and a covariance are compeers structurally) and so it came that you a kind of multiplyed covariances twice. Therefore in order to descend back on the level of the values of the original data (and to be able to compute correlation value) one has to take the root afterwards.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).

added 111 characters in body

Source Link

edited Feb 8, 2016 at 12:57

ttnphns

60.2k
55
294
545

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then $\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$, and this sum equals the negated sum of the off-diagonal elements of $\bf S$.

Return to distance correlation. What are we doing when we compute distance covariance? We have converted both nets of distances into their corresponding bunches of vectors. And then we compute the covariation (and subsequently the correlation) between the corresponding values of the two bunches: each scalar product value (former distance value) of one configuration is being multiplied by its corresponding one of the other configuration. That can be seen as (as was said in point 3) computing the usual covariance between two variables, after vectorizing the two matrices in those "variables".

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then $\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$.

What are we doing when we compute distance covariance? We have converted both nets of distances into their corresponding bunches of vectors. And then we compute the covariation (and subsequently the correlation) between the corresponding values of the two bunches: each scalar product value (former distance value) of one configuration is being multiplied by its corresponding one of the other configuration. That can be seen as (as was said in point 3) computing the usual covariance between two variables, after vectorizing the two matrices in those "variables".

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then $\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$, and this sum equals the negated sum of the off-diagonal elements of $\bf S$.

Return to distance correlation. What are we doing when we compute distance covariance? We have converted both nets of distances into their corresponding bunches of vectors. And then we compute the covariation (and subsequently the correlation) between the corresponding values of the two bunches: each scalar product value (former distance value) of one configuration is being multiplied by its corresponding one of the other configuration. That can be seen as (as was said in point 3) computing the usual covariance between two variables, after vectorizing the two matrices in those "variables".

added 154 characters in body

Source Link

edited Feb 8, 2016 at 12:21

ttnphns

60.2k
55
294
545

Now, the usualusual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then double-centered $\bf D^2$$\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$.

One important noteimportant note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).

Now, the usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then double-centered $\bf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).

Now, the usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then $\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).