9

I have a set of columns (col1,col2,col3) in dataframe df1 I have another set of columns (col4,col5,col6) in dataframe df2 Assume this two dataframes has the same number of rows.

How do I generate a correlation table that do pairwise correlation between df1 and df2?

the table will look like

 col1 col2 col3 col4 .. .. .. col5 .. .. .. col6 .. .. .. 

I use df1.corrwith(df2), it does not seem to generate the table as required.

I have seen the answer at How to check correlation between matching columns of two data sets?, but the main difference is that the col names does not matched.

1 Answer 1

25

pandas quick and dirty

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1'] 

numpy clean

def corr(df1, df2): n = len(df1) v1, v2 = df1.values, df2.values sums = np.multiply.outer(v2.sum(0), v1.sum(0)) stds = np.multiply.outer(v2.std(0), v1.std(0)) return pd.DataFrame((v2.T.dot(v1) - sums / n) / stds / n, df2.columns, df1.columns) corr(df1, df2) 

example

df1 = pd.DataFrame(np.random.rand(10, 4), columns=list('abcd')) df2 = pd.DataFrame(np.random.rand(10, 3), columns=list('xyz')) 

pd.concat([df1, df2], axis=1, keys=['df1', 'df2']).corr().loc['df2', 'df1'] a b c d x 0.235624 0.844665 -0.647962 0.535562 y 0.357994 0.462007 0.205863 0.424568 z 0.688853 0.350318 0.132357 0.687038 

corr(df1, df2) a b c d x 0.235624 0.844665 -0.647962 0.535562 y 0.357994 0.462007 0.205863 0.424568 z 0.688853 0.350318 0.132357 0.687038 
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, but I am comparing categorical fields instead, I decided to ask it as a new question, could you help take a look here, thanks in advance: stackoverflow.com/questions/41827716/…
@piRSquared Why do you say that the pandas solution is 'dirty'? I think it is a great one-liner that should be part of pandas package
@tashuhka you’re right. More quick and less dirty (-:
I tried to use this but I get KeyError: ('df1', 'df2')
I tried to use this but I get different correlation values from the actual/manual calculation in excel.
|