This question is referring to the previous post
The solutions proposed worked very well for a smaller data set, here I'm manipulating with 7 .txt files with a total memory of 750 MB. Which shouldn't be too big, so I must be doing something wrong in the process.
df1 = pd.read_csv('Data1.txt', skiprows=0, delimiter=' ', usecols=[1,2, 5, 7, 8, 10, 12, 13, 14]) df2 = pd.read_csv('Data2.txt', skiprows=0, delimiter=' ', usecols=[1,2, 5, 7, 8, 10, 12, 13, 14]) df3 = ... df4 = ... This is how one of my dataframes (df1) look like - head:
name_profile depth VAR1 ... year month day 0 profile_1 0.6 0.2044 ... 2012 11 26 1 profile_1 0.6 0.2044 ... 2012 11 26 2 profile_1 1.1 0.2044 ... 2012 11 26 3 profile_1 1.2 0.2044 ... 2012 11 26 4 profile_1 1.4 0.2044 ... 2012 11 26 ... And tail:
name_profile depth VAR1 ... year month day 955281 profile_1300 194.600006 0.01460 ... 2015 3 20 955282 profile_1300 195.800003 0.01095 ... 2015 3 20 955283 profile_1300 196.899994 0.01095 ... 2015 3 20 955284 profile_1300 198.100006 0.00730 ... 2015 3 20 955285 profile_1300 199.199997 0.01825 ... 2015 3 20 I followed a suggestion and dropped duplicates:
df1.drop_duplicates() ... etc.
Similarly df2 has VAR2, df3 VAR3 etc.
The solution is modified according to one of the answers from the previous post.
The aim is to create a new, merged DataFrame with all VARX (of each dfX) as additional columns to the depth, profile and other 3 ones, so I tried something like this:
dfs = [df.set_index(['depth','name_profile', 'year', 'month', 'day']) for df in [df1, df2, df3, df4, df5, df6, df7]] df_merged = (pd.concat(dfs, axis=1).reset_index()) The current error is:
ValueError: cannot handle a non-unique multi-index!
What am I doing wrong?
reduceis a very intensive process as it nests with each iteration. Useconcatinstead .dfs2 = [dfs1, df3].dfs1is, itself, a list of dataframes. You perhaps wanted toextendthe list orappendto it, not nest itdrop_duplicates(...)or run an aggregation to pick first pairinggroupby(...).first()