Here's one using the correlation defintion with NumPy tools meant for performance with corr2_coeff_rowwise -
pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values))
Sample run -
In [74]: dfa Out[74]: a b c d 0 2.0 6.0 8.0 12.0 In [75]: dfb Out[75]: a b c d 0 2 6 8 12 1 1 3 4 6 2 -1 -3 -4 -6 In [76]: pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values)) Out[76]: 0 1.0 1 1.0 2 -1.0 dtype: float64
Runtime test
Case #1 : Large number of rows in dfb and 4 columns -
In [77]: dfa = pd.DataFrame(np.random.randint(1,100,(1,4))) In [78]: dfb = pd.DataFrame(np.random.randint(1,100,(30000,4))) # @sera's soln In [79]: %timeit dfb.corrwith(dfa.iloc[0], axis=1) 1 loop, best of 3: 4.09 s per loop In [80]: %timeit pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values)) 1000 loops, best of 3: 1.53 ms per loop
Case #2 : Decent number of rows in dfb and 400 columns -
In [83]: dfa = pd.DataFrame(np.random.randint(1,100,(1,400))) In [85]: dfb = pd.DataFrame(np.random.randint(1,100,(300,400))) In [86]: %timeit dfb.corrwith(dfa.iloc[0], axis=1) 10 loops, best of 3: 44.8 ms per loop In [87]: %timeit pd.Series(corr2_coeff_rowwise(dfa.values,dfb.values)) 1000 loops, best of 3: 635 µs per loop