Removing columns in Pandas

Question

I work on a big Python dataframe and notice that some columns have same values for each row BUT columns' names are different. Also, some values are text, or timeseries data.

Any easy was to get rid of these columns duplicates and keep first each time?

Many thanks

are the values are partially duplicated or completely duplicated? — Talha Anwar
– Talha Anwar, Commented Jul 13, 2020 at 13:42
completely as far as i can see (300 000 rows), including the format — Pierre Kovatcheva
– Pierre Kovatcheva, Commented Jul 13, 2020 at 13:44
Welcome to SO. Please read stackoverflow.com/help/mcve and post your attempted code. — Bussller
– Bussller, Commented Jul 13, 2020 at 13:45
Does this answer your question? python pandas remove duplicate columns — Niko Fohr
– Niko Fohr, Commented Jul 13, 2020 at 14:03

Talha Anwar · Accepted Answer · 2020-07-13 13:49:40Z

Let create a dummy data frame, where two columns with different names are duplicate.

import pandas as pd df=pd.DataFrame({ 'col1':[1,2,3,'b',5,6], 'col2':[11,'a',13,14,15,16], 'col3':[1,2,3,'b',5,6], }) col1 col2 col3 0 1 11 1 1 2 a 2 2 3 13 3 3 b 14 b 4 5 15 5 5 6 16 6

To remove duplicate columns, first, take transpose, then apply drop_duplicate and again take transpose

df.T.drop_duplicates().T

result

 col1 col2 0 1 11 1 2 a 2 3 13 3 b 14 4 5 15 5 6 16

thanks Talha, no need to place an inplace=true somehwere to definitely modifiy the original df?
yes, you can place it, if you want to replace the original df instead of creating a new one

Collectives™ on Stack Overflow

Removing columns in Pandas

1 Answer 1

4 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Linked

Related