3

I want to compute the correlation between two different columns from the same data frame. This is the code I use:

Correlation_unemp_demvote=np.corrcoef(New_table['unemp'], New_table['demVote']) Correlation_unemp_demvote 

The outcome as follows:

array([[ 1. , 0.34167764], [ 0.34167764, 1. ]]) 

I was actually expecting to get a value between -1 and 1, as the real correlation coefficient definition explains. Could you explain to me the result I have just got? I've also seen lots of functions referred to correlations, like corr(), or correlate(). Which one should be better to be used?

Thanks,

1 Answer 1

5

pd.Series.corr is what you want.
Do this instead

Correlation_unemp_demvote = New_table['unemp'].corr(New_table['demVote']) 

example

df = pd.DataFrame(np.random.rand(10, 2), columns=list('AB')) df.A.corr(df.B) -0.1814956009745472 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.