I have many dataframes. They all share the same column structure "date", "open_position_profit", "more columns...".
date open_position_profit col2 col3 0 2008-04-01 -260.0 1 290.0 1 2008-04-02 -340.0 1 -60.0 2 2008-04-03 100.0 1 40.0 3 2008-04-04 180.0 1 -90.0 4 2008-04-05 0.0 0 0.0 0.0 1 Although "date" is present in all dataframes, they might or might not have the same count (some dates might be in one dataframe but not the other).
I want to compute a correlation matrix of the columns "open_position_profit" of all these dataframes.
I've tried this
dfs = [df1[["date", "open_position_profit"]], df2[["date", "open_position_profit"]], ...] pd.concat(dfs).groupby('date', as_index=False).corr() But this gives me a series of the correlation for each cell:
open_position_profit 0 open_position_profit 1.0 1 open_position_profit 1.0 2 open_position_profit 1.0 3 open_position_profit 1.0 4 open_position_profit NaN I want the correlation for the entire time series, not each single cell. How can I do this?