Correlation between columns of different dataframes

Question

I have many dataframes. They all share the same column structure "date", "open_position_profit", "more columns...".

 date open_position_profit col2 col3 0 2008-04-01 -260.0 1 290.0 1 2008-04-02 -340.0 1 -60.0 2 2008-04-03 100.0 1 40.0 3 2008-04-04 180.0 1 -90.0 4 2008-04-05 0.0 0 0.0 0.0 1

Although "date" is present in all dataframes, they might or might not have the same count (some dates might be in one dataframe but not the other).

I want to compute a correlation matrix of the columns "open_position_profit" of all these dataframes.

I've tried this

dfs = [df1[["date", "open_position_profit"]], df2[["date", "open_position_profit"]], ...] pd.concat(dfs).groupby('date', as_index=False).corr()

But this gives me a series of the correlation for each cell:

 open_position_profit 0 open_position_profit 1.0 1 open_position_profit 1.0 2 open_position_profit 1.0 3 open_position_profit 1.0 4 open_position_profit NaN

I want the correlation for the entire time series, not each single cell. How can I do this?

Kai Sasaki · Accepted Answer · 2019-09-15 14:04:40Z

If I understand your intention correctly, it is necessary to do outer join first. The following code does outer join by date key. The missing value can be represented by NaN.

df = pd.merge(df1, df2, on='date', how='outer') date open_position_profit_x open_position_profit_y ... ... 0 2019-01-01 ... 1 2019-01-02 ... 2 2019-01-03 ... 3 2019-01-04 ...

Then you can calculate the correlation with the new DataFrame.

df.corr() open_position_profit_x open_position_profit_y ... ... open_position_profit_x 1.000000 0.866025 open_position_profit_y 0.866025 1.000000 ... 1.000000 1.000000 ... 1.000000 1.000000

See: pd.merge

Collectives™ on Stack Overflow

Correlation between columns of different dataframes

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related