Skip to content
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/source/whatsnew/v1.1.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,10 @@ Bug fixes
- Bug in ``.groupby(..).rolling(..)`` where passing ``closed`` with column selection would raise a ``ValueError`` (:issue:`35549`)
- Bug in :class:`DataFrame` constructor failing to raise ``ValueError`` in some cases when ``data`` and ``index`` have mismatched lengths (:issue:`33437`)

**Reshaping**

- Bug where :meth:`DataFrame.replace` and :meth:`Series.replace` would fail when replacement value was of a different dtype but equal to values in the original object (:issue:`34871`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move to 1.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks similar to regression reported in #35376 with open PR (with different fix) #36444

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's the same issue. If another fix avoids astyping to object I would think it's preferred.


.. ---------------------------------------------------------------------------

.. _whatsnew_111.contributors:
Expand Down
10 changes: 7 additions & 3 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,9 +714,13 @@ def replace(
# retry
if not self._can_hold_element(to_replace):
if not isinstance(to_replace, list):
if inplace:
return [self]
return [self.copy()]
return self.astype(object).replace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only want to do this if these are different types, not generally. please show asv's for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if not self._can_hold_element(to_replace) check is making sure we have different types.

Not seeing any performance regressions from the change (at least in the replace benchmarks):

(pandas-dev) > asv continuous -f 1.1 upstream/master replace-diff-dtype -b ^replace · Creating environments · Discovering benchmarks ·· Uninstalling from conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt. ·· Installing 1e014777 <replace-diff-dtype> into conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt.. · Running 12 total benchmarks (2 commits * 1 environments * 6 benchmarks) [ 0.00%] · For pandas commit 9114b4bc <master> (round 1/2): [ 0.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt... [ 0.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt [ 4.17%] ··· Running (replace.Convert.time_replace--).... [ 20.83%] ··· Running (replace.ReplaceList.time_replace_list--).. [ 25.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 1/2): [ 25.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt... [ 25.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt [ 29.17%] ··· Running (replace.Convert.time_replace--).... [ 45.83%] ··· Running (replace.ReplaceList.time_replace_list--).. [ 50.00%] · For pandas commit 1e014777 <replace-diff-dtype> (round 2/2): [ 50.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt [ 54.17%] ··· replace.Convert.time_replace ok [ 54.17%] ··· ============= ============ ============ -- replace_data ------------- ------------------------- constructor Timestamp Timedelta ============= ============ ============ DataFrame 98.6±3ms 95.2±0.5ms Series 74.0±0.3ms 74.6±1ms ============= ============ ============ [ 58.33%] ··· replace.FillNa.time_fillna ok [ 58.33%] ··· ========= ============= inplace --------- ------------- True 1.04±0.01ms False 3.01±0.3ms ========= ============= [ 62.50%] ··· replace.FillNa.time_replace ok [ 62.50%] ··· ========= ============= inplace --------- ------------- True 1.10±0.02ms False 4.02±0.07ms ========= ============= [ 66.67%] ··· replace.ReplaceDict.time_replace_series ok [ 66.67%] ··· ========= ============ inplace --------- ------------ True 4.27±0.02s False 4.24±0.04s ========= ============ [ 70.83%] ··· replace.ReplaceList.time_replace_list ok [ 70.83%] ··· ========= ============ inplace --------- ------------ True 51.6±0.6μs False 405±5ms ========= ============ [ 75.00%] ··· replace.ReplaceList.time_replace_list_one_match ok [ 75.00%] ··· ========= ========== inplace --------- ---------- True 89.2±3ms False 467±4ms ========= ========== [ 75.00%] · For pandas commit 9114b4bc <master> (round 2/2): [ 75.00%] ·· Building for conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt... [ 75.00%] ·· Benchmarking conda-py3.7-Cython0.29.16-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pytables-pytest-scipy-sqlalchemy-xlrd-xlsxwriter-xlwt [ 79.17%] ··· replace.Convert.time_replace ok [ 79.17%] ··· ============= =========== =========== -- replace_data ------------- ----------------------- constructor Timestamp Timedelta ============= =========== =========== DataFrame 102±4ms 103±3ms Series 80.0±4ms 83.1±4ms ============= =========== =========== [ 83.33%] ··· replace.FillNa.time_fillna ok [ 83.33%] ··· ========= ============= inplace --------- ------------- True 1.10±0.04ms False 3.04±0.1ms ========= ============= [ 87.50%] ··· replace.FillNa.time_replace ok [ 87.50%] ··· ========= ============= inplace --------- ------------- True 1.20±0.04ms False 4.26±0.2ms ========= ============= [ 91.67%] ··· replace.ReplaceDict.time_replace_series ok [ 91.67%] ··· ========= ============ inplace --------- ------------ True 4.26±0.01s False 4.24±0.01s ========= ============ [ 95.83%] ··· replace.ReplaceList.time_replace_list ok [ 95.83%] ··· ========= ========== inplace --------- ---------- True 49.6±1μs False 412±6ms ========= ========== [100.00%] ··· replace.ReplaceList.time_replace_list_one_match ok [100.00%] ··· ========= ========== inplace --------- ---------- True 89.3±2ms False 464±30ms ========= ========== BENCHMARKS NOT SIGNIFICANTLY CHANGED. 
to_replace=to_replace,
value=value,
inplace=inplace,
regex=regex,
convert=convert,
)

to_replace = [x for x in to_replace if self._can_hold_element(x)]
if not len(to_replace):
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1492,6 +1492,13 @@ def test_replace_with_duplicate_columns(self, replacement):

tm.assert_frame_equal(result, expected)

def test_replace_equal_value_different_dtype(self):
# https://github.com/pandas-dev/pandas/issues/34871
df = pd.DataFrame([1, 2, 3])
result = df.replace(1.0, 0)
expected = pd.DataFrame([0, 2, 3])
tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(
reason="replace() changes dtype from period to object, see GH34871", strict=True
)
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -442,6 +442,13 @@ def test_replace_extension_other(self):
ser = pd.Series(pd.array([1, 2, 3], dtype="Int64"))
ser.replace("", "") # no exception

def test_replace_equal_value_different_dtype(self):
# https://github.com/pandas-dev/pandas/issues/34871
ser = pd.Series([1, 2, 3])
result = ser.replace(1.0, 0)
expected = pd.Series([0, 2, 3])
tm.assert_series_equal(result, expected)

def test_replace_with_compiled_regex(self):
# https://github.com/pandas-dev/pandas/issues/35680
s = pd.Series(["a", "b", "c"])
Expand Down