- Compute matrix of euclidean distances between
Ncases by variable $X$, and another likewise matrix by variable $Y$. Any of the two quantitative features, $X$ or $Y$, might be multivariate, not just univariate. - Perform double centering of each matrix. See how double centeringhow double centering is usually done. However, in our case, when doing it do not square the distances initially and don't divide by $-2$ in the end. Row, column means and overall mean of the elements become zero.
- Multiply the two resultant matrices elementwise and compute the sum; or equivalently, unwrap the matrices into two column vectors and compute their summed cross-product.
- Average, dividing by the number of elements,
N^2. - Take square root. The result is the distance covariance between $X$ and $Y$.
- Distance variances are the distance covariances of $X$, of $Y$ with own selves, you compute them likewise, points 3-4-5.
- Distance correlation is obtained from the three numbers analogously how Pearson correlation is obtained from usual covariance and the pair of variances: divide the covariance by the sq. root of the product of two variances.
In euclidean space, a scalar product is the similarity univocally tiedunivocally tied with the corresponding distance. If you have two points (vectors) you may express their closeness as scalar product instead of their distance without losing information.
Now, the usual double centeringusual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:
Additional taking of square root after (point 5) seems logical because in our case the moment was already itself a sort of covariance (a scalar product and a covariance are compeersare compeers structurally) and so it came that you a kind of multiplyed covariances twice. Therefore in order to descend back on the level of the values of the original data (and to be able to compute correlation value) one has to take the root afterwards.
One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attemptedattempted, unsuccessfully).