2

I have two dataframes df1 and df2 with key as index.

dict_1={'key':[1,1,1,2,2,3], 'col1':['a1','b1','c1','d1','e1','f1']} df1 = pd.DataFrame(dict_1).set_index('key') dict_2={'key':[1,1,2], 'col2':['a2','b2','c2']} df2 = pd.DataFrame(dict_2).set_index('key') 

df1:

 col1 key 1 a1 1 b1 1 c1 2 d1 2 e1 3 f1 

df2

 col2 key 1 a2 1 b2 2 c2 

Note that there are unequal rows for each index. I want to concatenate these two dataframes such that, I have the following dataframe (say df3).

df3

 col1 col2 key 1 a1 a2 1 b1 b2 2 d1 c2 

i.e. concatenate the two columns so that the new dataframe as the least (of df1 and df2) rows for each index.

I tried

pd.concat([df1,df2],axis=1) 

but I get the following error:

Value Error: Shape of passed values is (2,17), indices imply (2,7) 

My question: How can I concatentate df1 and df2 to get df3? Should I use DataFrame.merge instead? If so, how?

2 Answers 2

2

Merge/join alone will get you a lot of (hard to get rid of) duplicates. But a little trick will help:

df1['count1'] = 1 df1['count1'] = df1['count1'].groupby(df1.index).cumsum() df1 Out[198]: col1 count1 key 1 a1 1 1 b1 2 1 c1 3 2 d1 1 2 e1 2 3 f1 1 

The same thing for df2:

df2['count2'] = 1 df2['count2'] = df2['count2'].groupby(df2.index).cumsum() 

And finally:

df_aligned = df1.reset_index().merge(df2.reset_index(), left_on = ['key','count1'], right_on = ['key', 'count2']) df_aligned Out[199]: key col1 count1 col2 count2 0 1 a1 1 a2 1 1 1 b1 2 b2 2 2 2 d1 1 c2 1 

Now, you can reset index with set_index('key') and drop no longer needed columns countn.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the 'little trick'. That's the missing link that I couldn't comprehend. I accept this as the answer.
@Bhushan Glad I could help.
0

The biggest problem for why you are not going to be able to line up the two in the way that you want is that your keys are duplicative. How are you going to be line up the A1 value in df1 with the A2 value in df2 When A1, A2, B1, B2, and C1 all have the same key?

Using merge is what you'll want if you can resolve the key issues:

df3 = df1.merge(df2, left_index=True, right_index=True, how='inner') 

You can use inner, outer, left or right for how.

1 Comment

duplicative keys is part of the dataframe. Here is the larger context: 'id1' is the index of a buffer id [say: 0 to 255], and 'col1' and 'col2' are timestamps when the buffer is acquired and released. buffers are going to be acquired/released over a large time window and ptrj's answer helped me reorder the dataframe such that i can calculate the time duration in which a buffer is in use.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.