BUG: Fix __getitem__ KeyError for np.True_/np.False_ column keys#64822
Open
alvinttang wants to merge 1 commit intopandas-dev:mainfrom
Open
BUG: Fix __getitem__ KeyError for np.True_/np.False_ column keys#64822alvinttang wants to merge 1 commit intopandas-dev:mainfrom
alvinttang wants to merge 1 commit intopandas-dev:mainfrom
Conversation
…64749) The fix in GH#62888 added a PyBool_Check guard to distinguish Python bools from Python ints in the object-dtype hash table. However, that check also prevented equality between np.True_ (a numpy bool scalar, not a PyBool) and Python True, breaking DataFrame.__getitem__ for columns created with numpy bool keys. Narrow the guard so it only fires when the non-bool side is a Python int (PyLong_CheckExact). Numpy bool scalars fall through to PyObject_RichCompareBool, which correctly returns True for np.True_ == True. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| - Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (:issue:`32346`, :issue:`15231`, :issue:`47503`, :issue:`6485`, :issue:`25383`, :issue:`52235`, :issue:`17026`, :issue:`56010`) | ||
| - Bug in :meth:`Index.get_level_values` mishandling boolean, NA-like (``np.nan``, ``pd.NA``, ``pd.NaT``) and integer index names (:issue:`62169`) | ||
| - Bug in :meth:`MultiIndex.loc` returning incorrect results when indexing with :class:`numpy.datetime64` on a level containing :class:`datetime.date` objects (:issue:`55969`) | ||
| - Bug in :meth:`DataFrame.__getitem__` raising ``KeyError`` when a column was created with a ``numpy`` bool scalar (e.g. ``np.True_``) and accessed with a Python ``bool`` (e.g. ``True``) (:issue:`64749`) |
Member
There was a problem hiding this comment.
the bug is not present in a released version, does not need a whatsnew
| // frozenset isn't yet supported | ||
| } else if (PyBool_Check(a) != PyBool_Check(b)) { | ||
| } else if (PyBool_Check(a) != PyBool_Check(b) && | ||
| (PyLong_CheckExact(a) || PyLong_CheckExact(b))) { |
Member
There was a problem hiding this comment.
better to explicitly catch cnp.bool_?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #64749.
The fix in #64639 added
PyBool_Check(a) != PyBool_Check(b)to thepyobject_cmpfunction inkhash_python.hto distinguish Pythonboolfrom Pythonint. However,np.True_andnp.False_are numpy bool scalars —PyBool_Check(np.True_)returns 0 — so they were incorrectly treated as unequal to PythonTrue/False, breaking hash-table lookups for columns created with numpy bool keys.Root cause:
PyBool_Check(True) = 1butPyBool_Check(np.True_) = 0, soPyBool_Check(a) != PyBool_Check(b)fires and returns 0 (not equal) even thoughnp.True_ == Trueshould hold.Fix: Narrow the guard by also requiring
PyLong_CheckExact(a) || PyLong_CheckExact(b). This ensures the short-circuit only applies when the non-bool side is a Pythonint. Numpy bool scalars are neitherPyBoolnorPyLong, so they fall through toPyObject_RichCompareBool, which correctly handlesnp.True_ == True.Changes
pandas/_libs/include/pandas/vendored/klib/khash_python.h: addPyLong_CheckExactguard to thePyBool_Checkconditionpandas/tests/frame/indexing/test_getitem.py: addTestGetitemNumpyBoolwith 4 testsdoc/source/whatsnew/v3.1.0.rst: add entry under Indexing bugsTest plan
test_getitem_nptrue_column_with_true—df[True]on a{True: ...}DataFrame workstest_getitem_nptrue_column_after_concat— exact repro from the issue (concat then index)test_getitem_npfalse_column_with_false— same fornp.False_/Falsetest_getitem_bool_int_still_distinct— GH#62888 regression guard: PythonTrueand1remain distinct column keys🤖 Generated with Claude Code