Skip to content

BUG: Fix __getitem__ KeyError for np.True_/np.False_ column keys#64822

Open
alvinttang wants to merge 1 commit intopandas-dev:mainfrom
alvinttang:fix/getitem-nptrue-column
Open

BUG: Fix __getitem__ KeyError for np.True_/np.False_ column keys#64822
alvinttang wants to merge 1 commit intopandas-dev:mainfrom
alvinttang:fix/getitem-nptrue-column

Conversation

@alvinttang
Copy link

Summary

Fixes #64749.

The fix in #64639 added PyBool_Check(a) != PyBool_Check(b) to the pyobject_cmp function in khash_python.h to distinguish Python bool from Python int. However, np.True_ and np.False_ are numpy bool scalars — PyBool_Check(np.True_) returns 0 — so they were incorrectly treated as unequal to Python True/False, breaking hash-table lookups for columns created with numpy bool keys.

Root cause: PyBool_Check(True) = 1 but PyBool_Check(np.True_) = 0, so PyBool_Check(a) != PyBool_Check(b) fires and returns 0 (not equal) even though np.True_ == True should hold.

Fix: Narrow the guard by also requiring PyLong_CheckExact(a) || PyLong_CheckExact(b). This ensures the short-circuit only applies when the non-bool side is a Python int. Numpy bool scalars are neither PyBool nor PyLong, so they fall through to PyObject_RichCompareBool, which correctly handles np.True_ == True.

Changes

  • pandas/_libs/include/pandas/vendored/klib/khash_python.h: add PyLong_CheckExact guard to the PyBool_Check condition
  • pandas/tests/frame/indexing/test_getitem.py: add TestGetitemNumpyBool with 4 tests
  • doc/source/whatsnew/v3.1.0.rst: add entry under Indexing bugs

Test plan

  • test_getitem_nptrue_column_with_truedf[True] on a {True: ...} DataFrame works
  • test_getitem_nptrue_column_after_concat — exact repro from the issue (concat then index)
  • test_getitem_npfalse_column_with_false — same for np.False_/False
  • test_getitem_bool_int_still_distinct — GH#62888 regression guard: Python True and 1 remain distinct column keys

🤖 Generated with Claude Code

…64749) The fix in GH#62888 added a PyBool_Check guard to distinguish Python bools from Python ints in the object-dtype hash table. However, that check also prevented equality between np.True_ (a numpy bool scalar, not a PyBool) and Python True, breaking DataFrame.__getitem__ for columns created with numpy bool keys. Narrow the guard so it only fires when the non-bool side is a Python int (PyLong_CheckExact). Numpy bool scalars fall through to PyObject_RichCompareBool, which correctly returns True for np.True_ == True. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Bugs in setitem-with-expansion when adding new rows failing to keep the original dtype in some cases (:issue:`32346`, :issue:`15231`, :issue:`47503`, :issue:`6485`, :issue:`25383`, :issue:`52235`, :issue:`17026`, :issue:`56010`)
- Bug in :meth:`Index.get_level_values` mishandling boolean, NA-like (``np.nan``, ``pd.NA``, ``pd.NaT``) and integer index names (:issue:`62169`)
- Bug in :meth:`MultiIndex.loc` returning incorrect results when indexing with :class:`numpy.datetime64` on a level containing :class:`datetime.date` objects (:issue:`55969`)
- Bug in :meth:`DataFrame.__getitem__` raising ``KeyError`` when a column was created with a ``numpy`` bool scalar (e.g. ``np.True_``) and accessed with a Python ``bool`` (e.g. ``True``) (:issue:`64749`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the bug is not present in a released version, does not need a whatsnew

// frozenset isn't yet supported
} else if (PyBool_Check(a) != PyBool_Check(b)) {
} else if (PyBool_Check(a) != PyBool_Check(b) &&
(PyLong_CheckExact(a) || PyLong_CheckExact(b))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to explicitly catch cnp.bool_?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants