23

Consider following dataframe which has columns with same name (Apparently this does happens, currently I have a dataset like this! :( )

>>> df = pd.DataFrame({"a":range(10,15),"b":range(5,10)}) >>> df.rename(columns={"b":"a"},inplace=True) df a a 0 10 5 1 11 6 2 12 7 3 13 8 4 14 9 >>> df.columns Index(['a', 'a'], dtype='object') 

I would expect that when dropping by index , only the column with the respective index would be gone, but apparently this is not the case.

>>> df.drop(df.columns[-1],1) 0 1 2 3 4 

Is there a way to get rid of columns with duplicated column names?

EDIT: I choose missleading values for the first column, fixed now

EDIT2: the expected outcome is

 a 0 10 1 11 2 12 3 13 4 14 
3
  • Indeed I passed 1 to drop columns. I also got out empty df (the 01234 is index). I was just expecting I would get rid of the second(last, containg values 5 to 9, hence the -1) column and the dataframe would not become empty, but would have index 0 to 4 and values 0 to 4. Sry for choosing misleading values for the "a" column Commented Mar 4, 2016 at 14:08
  • @Pocin "Is there a way to get rid of columns with duplicated column names?" All columns had duplicated names and you got rid of them what else do you want? Commented Mar 4, 2016 at 14:11
  • My bad, today is not my day. In the edit I added expected outcome. The confusion springs from the fact that if I would like to get rid of all columns with duplicated names i would use df.drop("a",1). I wanted to bypass this by using integer column indices, but it had same effect as df.drop("a",1) Commented Mar 4, 2016 at 14:14

1 Answer 1

27

Actually just do this:

In [183]: df.ix[:,~df.columns.duplicated()] Out[183]: a 0 0 1 1 2 2 3 3 4 4 

So this index all rows and then uses the column mask generated from duplicated and invert the mask using ~

The output from duplicated:

In [184]: df.columns.duplicated() Out[184]: array([False, True], dtype=bool) 

UPDATE

As .ix is deprecated (since v0.20.1) you should do any of the following:

df.iloc[:,~df.columns.duplicated()] 

or

df.loc[:,~df.columns.duplicated()] 

Thanks to @DavideFiocco for alerting me

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.