0

Why can't I chain the get_dummies() function?

import pandas as pd df = (pd .read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') .drop(columns=['sepal_length']) .get_dummies() ) 

This works fine:

df = (pd .read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') .drop(columns=['sepal_length']) ) df = pd.get_dummies(df) 
5
  • Please add the dataset / sample dataset that you are using. Commented Sep 7, 2021 at 14:01
  • 3
    Because get_dummies is a pandas function not a method for dataframe. In the first try, you're trying tp apply a function on a dataframe. Commented Sep 7, 2021 at 14:02
  • Is there anyway to chain the function? Commented Sep 7, 2021 at 14:03
  • @Nivel I don't think so! Commented Sep 7, 2021 at 14:10
  • 1
    you can actually chain it with pd.Series.str.get_dummies() which is a series method. Details in my answer. Commented Sep 7, 2021 at 14:13

2 Answers 2

3

DataFrame.pipe can be helpful in chaining methods or function calls which are not natively attached to the DataFrame, like pd.get_dummies:

df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies) 

Or with lambda:

df = ( df.drop(columns=['sepal_length']) .pipe(lambda current_df: pd.get_dummies(current_df)) ) 

Sample DataFrame:

df = pd.DataFrame({'sepal_length': 1, 'a': list('ABACC'), 'b': list('ACCAB')}) 

df:

 sepal_length a b 0 1 A A 1 1 B C 2 1 A C 3 1 C A 4 1 C B 

Sample Output:

df = df.drop(columns=['sepal_length']).pipe(pd.get_dummies) 

df:

 a_A a_B a_C b_A b_B b_C 0 1 0 0 1 0 0 1 0 1 0 0 0 1 2 1 0 0 0 0 1 3 0 0 1 1 0 0 4 0 0 1 0 1 0 
Sign up to request clarification or add additional context in comments.

Comments

0

You can't chain the pd.get_dummies() method since it is not a pd.DataFrame method. However, assuming -

  1. You have a single column left after you drop your columns in the previous step in the chain.
  2. Your column is a string column dtype.

... you can use pd.Series.str.get_dummies() which is a series level method.

### Dummy Dataframe # A B # 0 1 x # 1 2 y # 2 3 z pd.read_csv(path).drop(columns=['A'])['B'].str.get_dummies() 
 x y z 0 1 0 0 1 0 1 0 2 0 0 1 

NOTE: Make sure that before you call the get_dummies() method, the data type of the object is series. In this case, I fetch column ['B'] to do that, which kinda makes the previous pd.DataFrame.drop() method unnecessary and useless :)

But this is only for example's sake.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.