Pandas dropping columns by index drops all columns with same name

Question

Consider following dataframe which has columns with same name (Apparently this does happens, currently I have a dataset like this! :( )

>>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)}) >>> df.rename(columns={"b":"a"},inplace=True) df a a 0 10 5 1 11 6 2 12 7 3 13 8 4 14 9 >>> df.columns Index(['a', 'a'], dtype='object')

I would expect that when dropping by index , only the column with the respective index would be gone, but apparently this is not the case.

>>> df.drop(df.columns[-1],1) 0 1 2 3 4

Is there a way to get rid of columns with duplicated column names?

EDIT: I choose missleading values for the first column, fixed now

EDIT2: the expected outcome is

 a 0 10 1 11 2 12 3 13 4 14

Indeed I passed 1 to drop columns. I also got out empty df (the 01234 is index). I was just expecting I would get rid of the second(last, containg values 5 to 9, hence the -1) column and the dataframe would not become empty, but would have index 0 to 4 and values 0 to 4. Sry for choosing misleading values for the "a" column — redacted
– redacted, Commented Mar 4, 2016 at 14:08
@Pocin "Is there a way to get rid of columns with duplicated column names?" All columns had duplicated names and you got rid of them what else do you want? — Stop harming Monica
– Stop harming Monica, Commented Mar 4, 2016 at 14:11
My bad, today is not my day. In the edit I added expected outcome. The confusion springs from the fact that if I would like to get rid of all columns with duplicated names i would use df.drop("a",1). I wanted to bypass this by using integer column indices, but it had same effect as df.drop("a",1) — redacted
– redacted, Commented Mar 4, 2016 at 14:14

EdChum · Accepted Answer · 2018-05-08 09:29:07Z

Actually just do this:

In [183]: df.ix[:,~df.columns.duplicated()] Out[183]: a 0 0 1 1 2 2 3 3 4 4

So this index all rows and then uses the column mask generated from duplicated and invert the mask using ~

The output from duplicated:

In [184]: df.columns.duplicated() Out[184]: array([False, True], dtype=bool)

UPDATE

As .ix is deprecated (since v0.20.1) you should do any of the following:

df.iloc[:,~df.columns.duplicated()]

or

df.loc[:,~df.columns.duplicated()]

Thanks to @DavideFiocco for alerting me

Collectives™ on Stack Overflow

Pandas dropping columns by index drops all columns with same name

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related