Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Modify nanminmax to better handle object dtype
  • Loading branch information
Daniel Saxton authored and Daniel Saxton committed Feb 14, 2020
commit 12a232344a70239e16e4bc3a92f24dcfd06c9df4
8 changes: 8 additions & 0 deletions pandas/core/nanops.py
Original file line number Diff line number Diff line change
Expand Up @@ -854,6 +854,8 @@ def reduction(
mask: Optional[np.ndarray] = None,
) -> Dtype:

na_mask = isna(values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should already have the mask (pass it in when you call this).


values, mask, dtype, dtype_max, fill_value = _get_values(
values, skipna, fill_value_typ=fill_value_typ, mask=mask
)
Expand All @@ -864,6 +866,12 @@ def reduction(
result.fill(np.nan)
except (AttributeError, TypeError, ValueError):
result = np.nan
elif is_object_dtype(dtype) and na_mask.any():
# Need to explicitly mask NA values for object dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

if skipna:
result = getattr(values[~na_mask], meth)(axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This masking could also be done in the min/max functions? (as you had before?)

Or, another option might be to add a min/max function to mask_ops.py, similarly as I am doing for sum in #30982 (but it should be simpler for min/max, as those don't need to handle the min_count)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a benefit of having it here is that this also fixes a bug for Series: pd.Series(["a", np.nan]).min() currently raises even though it shouldn't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a good point. Can you add a test for that, then?

Now, that aside, I think longer term we still want the separate min/max in mask_ops.py, so it can also be used for the int dtypes. But that can then certainly be done for a separate PR.

else:
return np.nan
else:
result = getattr(values, meth)(axis)

Expand Down