As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:
(x1, y1) (x2, y2) ... (xn, yn) we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.
It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).
Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).
If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.
Sounds reasonable, however there are two aspects that I do not understand.
First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?
Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?