1
$\begingroup$

I have a dataset of monthly data. One column is my target variable and all the other are my feature. I have computed correlation between my target and all the other feature and then I made linear regression and got my betas and R2.

Now my question is more theoretical. if I oversample to daily data (I used a linear interpolation) and compute again correlation, betas and R2, they have changed a lot. Can anybody explain me why that happens? is correlation affected by oversampling? I might expect my betas to change because I have much more data after oversampling and so the R2, but not really the correlation if the size of my monthly data was already quite large. Thanks

$\endgroup$

1 Answer 1

2
$\begingroup$

When you carry out correlation coefficient between target variable (denoted as x) and feature variable (denoted as y), the correlation coefficient is a function of sample size:

$ r = \frac{n \Sigma xy - (\Sigma x \Sigma y)}{\sqrt{(n\Sigma x^2 - \bar{x}^2 )(n\Sigma y^2 - \bar{y}^2 )}}$

So daily data will impact on correlation.

$\endgroup$
2
  • $\begingroup$ yes I see, thanks for the answer. do you know if it is possible to oversample two time series by forcing them to have the same correlation as they had before being oversampled? Thanks $\endgroup$ Commented Jul 15, 2020 at 14:14
  • $\begingroup$ Up to my knowledge, I do not think there is a formula to convert daily correlation into monthly. If you want to fix correlation, use solver function in excel. You fix the correlation cell, and your feature variable cells will be the variable factor. But this is called fiddling. $\endgroup$ Commented Jul 16, 2020 at 6:24

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.