0

I work on a big Python dataframe and notice that some columns have same values for each row BUT columns' names are different. Also, some values are text, or timeseries data.

Any easy was to get rid of these columns duplicates and keep first each time?

Many thanks

4
  • are the values are partially duplicated or completely duplicated? Commented Jul 13, 2020 at 13:42
  • completely as far as i can see (300 000 rows), including the format Commented Jul 13, 2020 at 13:44
  • Welcome to SO. Please read stackoverflow.com/help/mcve and post your attempted code. Commented Jul 13, 2020 at 13:45
  • Does this answer your question? python pandas remove duplicate columns Commented Jul 13, 2020 at 14:03

1 Answer 1

1

Let create a dummy data frame, where two columns with different names are duplicate.

import pandas as pd df=pd.DataFrame({ 'col1':[1,2,3,'b',5,6], 'col2':[11,'a',13,14,15,16], 'col3':[1,2,3,'b',5,6], }) col1 col2 col3 0 1 11 1 1 2 a 2 2 3 13 3 3 b 14 b 4 5 15 5 5 6 16 6 

To remove duplicate columns, first, take transpose, then apply drop_duplicate and again take transpose

df.T.drop_duplicates().T 

result

 col1 col2 0 1 11 1 2 a 2 3 13 3 b 14 4 5 15 5 6 16 
Sign up to request clarification or add additional context in comments.

4 Comments

thanks Talha, no need to place an inplace=true somehwere to definitely modifiy the original df?
yes, you can place it, if you want to replace the original df instead of creating a new one
where exactly can i place it?
df.T.drop_duplicates(inplace=True).T

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.