Skip to main content
Notice removed Draw attention by CommunityBot
Bounty Ended with ttnphns's answer chosen by CommunityBot
added a tag
Source Link
amoeba
  • 109k
  • 37
  • 325
  • 350

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand.   

First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand.  First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. 

First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

Tweeted twitter.com/StackStats/status/670326234908401664
a better title
Link
ttnphns
  • 60.2k
  • 55
  • 294
  • 545

Question regarding Understanding distance correlation computations

added 12 characters in body; edited title
Source Link
ttnphns
  • 60.2k
  • 55
  • 294
  • 545

How does Question regarding distance correlation work?computations

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understandhowever there are two aspects that I do not understand. FirstFirst, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

SecondSecond, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

How does distance correlation work?

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

Question regarding distance correlation computations

As far as I understood distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1) (x2, y2) ... (xn, yn) 

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand. First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

minor grammar fixes
Source Link
Silverfish
  • 24.4k
  • 28
  • 108
  • 217
Loading
Notice added Draw attention by Roman
Bounty Started worth 50 reputation by Roman
Source Link
Roman
  • 774
  • 3
  • 30
  • 48
Loading