12

I know two ways of adding a new column to pandas dataframe

df_new = df.assign(new_column=default_value) 

and

df[new_column] = default_value 

The first one does not add columns inplace, but the second one does. So, which one is more efficient to use?

Apart from these two is there is any all the more efficient method than these?

2
  • 1
    I think question is about performance, so reopened. Commented Sep 12, 2018 at 7:22
  • The original question discusses relative performance in the comments. stackoverflow.com/questions/12555323/… Commented Sep 12, 2018 at 7:22

1 Answer 1

17

I think second one, assign is used if want nice code witch chaining all functions - one line code:

df = pd.DataFrame({'A':np.random.rand(10000)}) default_value = 10 In [114]: %timeit df_new = df.assign(new_column=default_value) 228 µs ± 4.26 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) In [115]: %timeit df['new_column'] = default_value 86.1 µs ± 654 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 

I use perfplot for ploting:

pic


import perfplot default_value = 10 def chained(df): df = df.assign(new_column=default_value) return df def no_chained(df): df['new_column'] = default_value return df def make_df(n): df = pd.DataFrame({'A':np.random.rand(n)}) return df perfplot.show( setup=make_df, kernels=[chained, no_chained], n_range=[2**k for k in range(2, 25)], logx=True, logy=True, equality_check=False, xlabel='len(df)') 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.