I would want to find a way in python to merge the files on 'seq' but return all the ones with the same id, in this example only the lines with id 2 would be removed.
File one:
seq,id CSVGPPNNEQFF,0 CTVGPPNNEQFF,0 CTVGPPNNERFF,0 CASRGEAAGFYEQYF,1 RASRGEAAGFYEQYF,1 CASRGGAAGFYEQYF,1 CASSDLILYYEQYF,2 CASSDLILYYTQYF,2 CASSGSYEQYF,3 CASSGSYEQYY,3 File two:
seq CSVGPPNNEQFF CASRGEAAGFYEQYF CASSGSYEQYY Output:
seq,id CSVGPPNNEQFF,0 CTVGPPNNEQFF,0 CTVGPPNNERFF,0 CASRGEAAGFYEQYF,1 RASRGEAAGFYEQYF,1 CASRGGAAGFYEQYF,1 CASSGSYEQYF,3 CASSGSYEQYY,3 I have tried:
df3 = df1.merge(df2.groupby('seq',as_index=False)[['seq']].agg(','.join),how='right') output:
seq,id CASRGEAAGFYEQYF,1 CASSGSYEQYY,3 CSVGPPNNEQFF,0 Does anyone have any advice how to solve this?