Pandas: Avoid inplace¶
Sourcery suggestion id: pandas-avoid-inplace¶
Available starting with version 1.1.0
Description¶
Don't use inplace for methods that always create a copy under the hood.
Before¶
import pandas as pd df = pd.DataFrame( [ ["Python", 190], ["JavaScript", 33], ], columns=["Language", "Number of rules"], ) df.sort_values("Language", inplace=True) After¶
import pandas as pd df = pd.DataFrame( [ ["Python", 190], ["JavaScript", 33], ], columns=["Language", "Number of rules"], ) df = df.sort_values("Language") Before¶
import pandas as pd df = pd.DataFrame( [ ["Python", 190], ["JavaScript", 33], ], columns=["Language", "Number of rules"], ) df.copy().sort_values("Language", inplace=True) After¶
import pandas as pd df = pd.DataFrame( [ ["Python", 190], ["JavaScript", 33], ], columns=["Language", "Number of rules"], ) df.copy().sort_values("Language") Explanation¶
Some DataFrame methods can never operate inplace. Their operation (like reordering rows) requires copying, so they create a copy even if you provide inplace=True.
For these methods, inplace doesn't bring a performance gain.
It's only a "syntactic sugar for reassigning the new result to the calling DataFrame/Series."
Drawbacks of using inplace:
- You can't use method chaining with
inplace=True- The
inplacekeyword complicates type annotations (because the return value depends on the value ofinplace)- Using
inplace=Truegives code that mutates the state of an object and thus has side-effects. That can introduce subtle bugs and is harder to debug.
This PDEP suggests to deprecate the inplace option for methods that can never operate inplace.
Best practice: Explicitly reassign the result to the caller DataFrame.
E.g.
df = df.sort_values("language") In cases, where the caller isn't a variable but an expression, inplace doesn't have an effect anyway.
df.copy().sort_values("Language", inplace=True) copy creates a new DataFrame object, which isn't assigned to any variable. inplace doesn't change the df object, but this copy result object instead.
In this case, the only effect of inplace is that the expression returns None instead of a new DataFrame.
Thus, it should be omitted for clarity.
df.copy().sort_values("Language") DataFrame Methods Affected¶
These DataFrame methods always create a copy under the hood even if you provide the inplace keyword. In PDEP-8, they are mentioned as "Group 4" methods.
dropnadrop_duplicatessort_valuessort_indexevalquery