2

Got a dataframe df with a column "Id"

 Id 0 -KkJz3CoJNM 1 08QMXEQbEWw 2 0ANuuVrIWJw 3 0pPU8CtwXTo 4 1-wYH2LEcmk 

I need to convert column "Id" into a set() but

set_id = set(df["Id"]) print(set_id) 

returns

{'Id'} 

instead of a set() of the strings from column "Id"?

1 Answer 1

5

For me working correctly if exist only one id column:

set_id = set(df["Id"]) print(set_id) {'1-wYH2LEcmk', '08QMXEQbEWw', '0pPU8CtwXTo', '0ANuuVrIWJw', '-KkJz3CoJNM'} 

But if there is more columns names id then df['id'] return DataFrame, so set(df["Id"]) return unique columns names:

#test for 2 columns with sample data df = pd.concat([df, df], axis=1) print (df["Id"]) Id Id 0 -KkJz3CoJNM -KkJz3CoJNM 1 08QMXEQbEWw 08QMXEQbEWw 2 0ANuuVrIWJw 0ANuuVrIWJw 3 0pPU8CtwXTo 0pPU8CtwXTo 4 1-wYH2LEcmk 1-wYH2LEcmk set_id = set(df["Id"]) print(set_id) {'Id'} 

Because:

L = list(df["Id"]) print(L) ['Id', 'Id'] 

working same like

L = list(df["Id"].columns) print(L) ['Id', 'Id'] 

and similar for sets:

set_id = set(df["Id"].columns) print(set_id) {'Id'} 

Possible solution for deduplicate columns:

c = df.columns.to_series() df.columns += c.groupby(c).cumcount().astype(str).radd('.').replace('.0','') print (df) Id Id.1 0 -KkJz3CoJNM -KkJz3CoJNM 1 08QMXEQbEWw 08QMXEQbEWw 2 0ANuuVrIWJw 0ANuuVrIWJw 3 0pPU8CtwXTo 0pPU8CtwXTo 4 1-wYH2LEcmk 1-wYH2LEcmk 

Or if always same values remove duplicated columns:

df = df.loc[:, ~df.columns.duplicated()] print (df) Id 0 -KkJz3CoJNM 1 08QMXEQbEWw 2 0ANuuVrIWJw 3 0pPU8CtwXTo 4 1-wYH2LEcmk 
Sign up to request clarification or add additional context in comments.

7 Comments

I do have "Id" twice for some odd reason. But df = df.drop_duplicates() does not work somehow? Still twice "Id"?
@Vega - Then is necessary df = df.loc[:, ~df.columns.duplicated()].
Your solution seems to work but how can .drop_duplicates() not work? Isn't that the exact usecase for this?
@Vega - OK, It working if transpose - like df = df.T.drop_duplicates().T - Because be default pandas remove duplicates by rows, not by columns.
@Vega - Because not exist like df = df.drop_duplicates(axis=1)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.