pandas merge dataframes: keep all rows when one of the columns matches another dataframe

Question

I would want to find a way in python to merge the files on 'seq' but return all the ones with the same id, in this example only the lines with id 2 would be removed.

File one:

seq,id CSVGPPNNEQFF,0 CTVGPPNNEQFF,0 CTVGPPNNERFF,0 CASRGEAAGFYEQYF,1 RASRGEAAGFYEQYF,1 CASRGGAAGFYEQYF,1 CASSDLILYYEQYF,2 CASSDLILYYTQYF,2 CASSGSYEQYF,3 CASSGSYEQYY,3

File two:

seq CSVGPPNNEQFF CASRGEAAGFYEQYF CASSGSYEQYY

Output:

seq,id CSVGPPNNEQFF,0 CTVGPPNNEQFF,0 CTVGPPNNERFF,0 CASRGEAAGFYEQYF,1 RASRGEAAGFYEQYF,1 CASRGGAAGFYEQYF,1 CASSGSYEQYF,3 CASSGSYEQYY,3

I have tried:

df3 = df1.merge(df2.groupby('seq',as_index=False)[['seq']].agg(','.join),how='right')

output:

seq,id CASRGEAAGFYEQYF,1 CASSGSYEQYY,3 CSVGPPNNEQFF,0

Does anyone have any advice how to solve this?

druskacik · Accepted Answer · 2022-04-26 09:41:55Z

Do you want to merge two dataframes, or just take subset of the first dataframe according to which id is included in the second dataframe (by seq)? Anyway, this gives the required result.

df1 = pd.DataFrame({ 'seq': [ 'CSVGPPNNEQFF', 'CTVGPPNNEQFF', 'CTVGPPNNERFF', 'CASRGEAAGFYEQYF', 'RASRGEAAGFYEQYF', 'CASRGGAAGFYEQYF', 'CASSDLILYYEQYF', 'CASSDLILYYTQYF', 'CASSGSYEQYF', 'CASSGSYEQYY' ], 'id': [0, 0, 0, 1, 1, 1, 2, 2, 3, 3] }) df2 = pd.DataFrame({ 'seq': [ 'CSVGPPNNEQFF', 'CASRGEAAGFYEQYF', 'CASSGSYEQYY' ] }) df3 = df1.loc[df1['id'].isin(df1['id'][df1['seq'].isin(df2['seq'])])]

Explanation: df1['id'][df1['seq'].isin(df2['seq'])] takes those values of id from df1 that contain at least one seq that is included in df2. Then all rows with those values of id are taken from df1.

Léo Beaucourt · Accepted Answer · 2022-04-26 09:15:18Z

0

You can use the isin() pandas method, code shall looks as follow :

df1.loc[df1['seq'].isin(df2['seq'])]

Assuming, both objects are pandas dataframe and 'seq' is a column.

answered Apr 26, 2022 at 9:15

Léo Beaucourt

2822 silver badges6 bronze badges

2 Comments

Xela Vi Over a year ago

The output of this command is exactly like what I had with the df.merge command.

Léo Beaucourt Over a year ago

My bad, I didn't get the point about 'id' in your question. The other answer given is the right one.

Collectives™ on Stack Overflow

pandas merge dataframes: keep all rows when one of the columns matches another dataframe

2 Answers 2

Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Related