Finding correlation for corresponding columns in dataframe

Question

I have two data frames with 200 columns each. For illustration I am using only 3 columns here.

Dataframe df1 as:

 A B C 1/4/2017 5 6 6 1/5/2017 5 2 1 1/6/2017 6 2 10 1/9/2017 1 9 10 1/10/2017 6 6 4 1/11/2017 6 1 1 1/12/2017 1 7 10 1/13/2017 8 9 6

Dataframe df2:

 A D B 1/4/2017 8 10 2 1/5/2017 2 1 8 1/6/2017 6 6 6 1/9/2017 1 8 1 1/10/2017 10 6 2 1/11/2017 10 2 4 1/12/2017 5 4 10 1/13/2017 5 2 8

I want to calculate the following correlation matrix for corresponding columns of df1 and df2:

 A B 1/4/2017 1/5/2017 1/6/2017 0.19 -0.94 1/9/2017 0.79 -0.96 1/10/2017 0.90 -0.97 1/11/2017 1.00 -1.00 1/12/2017 1.00 0.42 1/13/2017 0.24 0.84

i.e. using trailing 3 day historical data for same columns of df1 and df2, I need to find the correlation matrix.

so, I calculated corr([5, 5, 6], [8, 2, 6]) = 0.19 where [5,5,6] is from df1['A'] and [8,2,6] is from df2['A']

Since, I have 200 columns each I am finding it extremely cumbersome to run a for loop two times. First loop through columns and second using trailing 3 day lag data.

BENY · Accepted Answer · 2017-10-12 02:28:32Z

Is this what you need ?

l=[] id=df1.columns.intersection(df2.columns) for x in id: l.append(pd.rolling_corr(df1[x],df2[x],window=3))# notice you should change it to `l.append(df1[x].rolling(3).corr(df2[x]))` pd.concat(l,axis=1) Out[13]: A B 1/4/2017 NaN NaN 1/5/2017 NaN NaN 1/6/2017 0.188982 -0.944911 1/9/2017 0.785714 -0.960769 1/10/2017 0.896258 -0.968620 1/11/2017 1.000000 -0.998906 1/12/2017 1.000000 0.423415 1/13/2017 0.240192 0.838628

@piRSquared look into that , Always like to learn numpy method !!1

piRSquared · Accepted Answer · 2017-10-12 20:46:11Z

Option 1
I built a generator and wrapped it in pd.concat

def rolling_corrwith(d1, d2, window): d1, d2 = d1.align(d2, 'inner') for i in range(len(d1) - window + 1): j = i + window yield d1.iloc[i:j].corrwith(d2.iloc[i:j]).rename(d1.index[j-1]) pd.concat(list(rolling_corrwith(df1, df2, 3)), axis=1).T A B 1/6/2017 0.188982 -0.944911 1/9/2017 0.785714 -0.960769 1/10/2017 0.896258 -0.968620 1/11/2017 1.000000 -0.998906 1/12/2017 1.000000 0.423415 1/13/2017 0.240192 0.838628

Option 2
Using numpy strides. I don't recommend this approach. But it's worth mentioning for those who are interested.

from numpy.lib.stride_tricks import as_strided as strided def sprp(v, w): s0, s1 = v.strides n, m = v.shape return strided(v, (n + 1 - w, w, m), (s0, s0, s1)) def rolling_corrwith2(d1, d2, window): d1, d2 = d1.align(d2, 'inner') s1 = sprp(d1.values, window) s2 = sprp(d2.values, window) m1 = s1.mean(1, keepdims=1) m2 = s2.mean(1, keepdims=1) z1 = s1.std(1) z2 = s2.std(1) c = ((s1 - m1) * (s2 - m2)).sum(1) / z1 / z2 / window return pd.DataFrame(c, d1.index[window - 1:], d1.columns) rolling_corrwith2(df1, df2, 3) A B 1/6/2017 0.188982 -0.944911 1/9/2017 0.785714 -0.960769 1/10/2017 0.896258 -0.968620 1/11/2017 1.000000 -0.998906 1/12/2017 1.000000 0.423415 1/13/2017 0.240192 0.838628

Collectives™ on Stack Overflow

Finding correlation for corresponding columns in dataframe

2 Answers 2

2 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Related