8
$\begingroup$

I have been learning about standard methods in Statistics such as the Pearson's Correlation Coefficient, Spearman's Correlation and Kendall's Tau.

My understanding of this so far is that:

  • Pearson Correlation Coefficient measures the linear correlation between two sets of data

  • Spearman's Correlation measures the "monocity" between two sets of data (e.g. do they both increase and decrease at the same time?)

  • Kendall's Tau measures the ordinal association between two sets of data - supposedly Kendall's Tau is similar to the Spearman Correlation, but Kendall's Tau has a more logical confidence intervals.

I had the following question - can any of these methods be used for measuring a specific form of "Non Linear Correlation" between two sets of data?

For example - suppose I want to see how strongly two sets of data are correlated relative to a "second order curve" :

enter image description here

Is there something that could measure the "curved correlation"?

The two ideas I came up with:

  • Try to use some data transformations (e.g. Log) to transform one of the variables into a more linear pattern that will make it suitable for one of the above measures

  • Fit a polynomial regression model (of order 2) to this data and measure the MSE

But I am not sure if either of these approaches are suitable.

$\endgroup$
4
  • $\begingroup$ Interesting question. Some of the trouble of defining a curved correlation will be deciding on what kind of curvature you want to measure. After all, a logarithm-type of graph has different curvature than a quadratic. Further, determining the sign will be challenging, since many curves (such as quadratics) allow for increasing and decreasing sections. I’ve wondered if the concavity of a parabola (up-opening vs down-opening) could be used for this, but parabolas are just one type of curve. (Maybe you can do this if you restrict to convex or concave functions.) $\endgroup$ Commented Nov 9, 2022 at 6:59
  • 2
    $\begingroup$ (1) What do you mean by "measuring"? If you want a measure of the "strength" of such a correlation, then you could indeed run a polynomial regression and report the MSE. Possibly cross-validated, otherwise if you re-ran this for higher order polynomials, you would "find" that the "second-order correlation" is smaller than the "third-order correlation" and so on. Conversely, if you want to do statistical inference, the null and alternative hypotheses will need some thinking about - are $x$ and $x^3$ for $-1<x<1$ "significantly second order correlated"? ... $\endgroup$ Commented Nov 9, 2022 at 7:41
  • 2
    $\begingroup$ ... (2) Especially for inference, the question comes up whether you want to test a specific polynomial correlation, or a general second-order polynomial, or a general polynomial of up to second order. Perhaps you could explain what you want to do with such a nonlinear correlation? $\endgroup$ Commented Nov 9, 2022 at 7:42
  • 2
    $\begingroup$ Another way to consider Stephan's comments is that every regression you could estimate for the two variables in your plot is, in a sense, a correlation measurement. Testing and comparing arbitrarily many regressions has problems with false discovery and statistical validity, so "just try stuff" isn't a great way to go about it: you need to be specific about what questions you want to ask your data & how you want to ask it. The plot you show is roughly monotonic & Spearman's correlation would characterize the extent. Lots of nonlinear functions are monotonic, so Spearman's is an answer. $\endgroup$ Commented Nov 10, 2022 at 3:28

2 Answers 2

6
$\begingroup$

You may be interested in distance correlation.

Distance correlation measures both linear and nonlinear association between two random variables or random vectors. This is in contrast to Pearson's correlation, which can only detect a linear association between two random variables.

Wikipedia link

$\endgroup$
1
6
$\begingroup$

I found an interesting article here published on 31 Mar 2024, in which a new correlation coefficient has been introduced. As I tested, it works quite well with non-linear relationships (I also tested a quadratic function). The Python implementation is found here.

Below is the code:

x=np.linspace(-3.14,3.14, 100) y= x**2 + np.random.random(len(x)) print(np.round(pearsonr(x, y),4)) # => [-0.0065 0.9485] print(np.round(xicor(x, y),4)) # => [0.8611 0. ] 

As can be seen, the xicor() output is much closer to 1. If you remove the random noise, the output will be [0.9408 0. ].

The images below are have been reported by the author.

enter image description here enter image description here

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.