Notice removed Draw attention by CommunityBot

occurred Dec 5, 2015 at 17:32

Bounty Ended with ttnphns's answer chosen by CommunityBot

occurred Dec 5, 2015 at 17:32

added a tag

edited Nov 27, 2015 at 22:54

109k
37
325
350

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand.

First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand.

First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

Tweeted twitter.com/StackStats/status/670326234908401664

occurred Nov 27, 2015 at 19:40

a better title

Link

edited Nov 27, 2015 at 19:29

ttnphns

60.2k
55
294
545

Question regarding Understanding distance correlation computations

added 12 characters in body; edited title

Source Link

edited Nov 27, 2015 at 16:24

ttnphns

60.2k
55
294
545

How does Question regarding distance correlation work?computations

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understandhowever there are two aspects that I do not understand. FirstFirst, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

SecondSecond, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

How does distance correlation work?

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

Question regarding distance correlation computations

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?