-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Open
Labels
BugDatetimeDatetime data dtypeDatetime data dtypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Description
pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.
For a setting operation this is convenient & I think expected as a user
In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]}) In [36]: df Out[36]: A B C D 0 NaT NaN foo 1 In [37]: df.dtypes Out[37]: A datetime64[ns] B object C object D int64 dtype: object In [38]: df.loc[0,'D'] = 1.0 In [39]: df.dtypes Out[39]: A datetime64[ns] B object C object D float64 dtype: object However for a .fillna (or .replace) operation this might be a bit unexpected. So A was coerced to object dtype, even though it was datetime64[ns].
In [40]: df.fillna('') Out[40]: A B C D 0 foo 1 In [41]: df.fillna('').dtypes Out[41]: A object B object C object D float64 dtype: object So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'. This last behavior would be equiv of errors='coerce'. While skipping this column would be done with errors='coerce'. (and of course raise would raise.
Ideally would have a default of coerce I think (to skip for non-compat values). Any thoughts on this?
34j
Metadata
Metadata
Assignees
Labels
BugDatetimeDatetime data dtypeDatetime data dtypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate