-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: Implement Kleene logic for BooleanArray #29842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
bb904cb 13c7ea3 fff786f 4067e7f 708c553 c56894e 2e9d547 373aaab 7f78a64 36b171b 747e046 d0a8cca fe061b0 9f9e44c 0a34257 2ba0034 2d1129a a24fc22 77dd1fc 7b9002c c18046b 1237caa 2ecf9b8 87aeb09 969b6dc 1c9ba49 8eec954 cb47b6a 2a946b9 efb6f8b 004238e 5a2c81c 7032318 bbb7f9b ce763b4 5bc5328 457bd08 31c2bc6 File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| import numpy as np | ||
| | ||
| import pandas as pd | ||
| | ||
| | ||
| class TimeLogicalOps: | ||
| def setup(self): | ||
| N = 10_000 | ||
| left, right, lmask, rmask = np.random.randint(0, 2, size=(4, N)).astype("bool") | ||
| self.left = pd.arrays.BooleanArray(left, lmask) | ||
| self.right = pd.arrays.BooleanArray(right, rmask) | ||
| | ||
| def time_or_scalar(self): | ||
| self.left | True | ||
| self.left | False | ||
| | ||
| def time_or_array(self): | ||
| self.left | self.right | ||
| | ||
| def time_and_scalar(self): | ||
| self.left & True | ||
| self.left & False | ||
| | ||
| def time_and_array(self): | ||
| self.left & self.right | ||
| | ||
| def time_xor_scalar(self): | ||
| self.left ^ True | ||
| self.left ^ False | ||
| | ||
| def time_xor_array(self): | ||
| self.left ^ self.right | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| .. currentmodule:: pandas | ||
| | ||
| .. ipython:: python | ||
| :suppress: | ||
| | ||
| import pandas as pd | ||
| import numpy as np | ||
| | ||
| .. _boolean: | ||
| | ||
| ************************** | ||
| Nullable Boolean Data Type | ||
| ************************** | ||
| | ||
| .. versionadded:: 1.0.0 | ||
| | ||
| .. _boolean.kleene: | ||
| | ||
| Kleene Logical Operations | ||
| ------------------------- | ||
| | ||
| :class:`arrays.BooleanArray` implements `Kleene Logic`_ (sometimes called three-value logic) for | ||
| logical operations like ``&`` (and), ``|`` (or) and ``^`` (exclusive-or). | ||
| | ||
| This table demonstrates the results for every combination. These operations are symmetrical, | ||
| so flipping the left- and right-hand side makes no difference in the result. | ||
| | ||
| ================= ========= | ||
| Expression Result | ||
| ================= ========= | ||
| ``True & True`` ``True`` | ||
| ``True & False`` ``False`` | ||
| ``True & NA`` ``NA`` | ||
| ``False & False`` ``False`` | ||
| ``False & NA`` ``False`` | ||
| ``NA & NA`` ``NA`` | ||
| ``True | True`` ``True`` | ||
| ``True | False`` ``True`` | ||
| ``True | NA`` ``True`` | ||
| ``False | False`` ``False`` | ||
| ``False | NA`` ``NA`` | ||
| ``NA | NA`` ``NA`` | ||
| ``True ^ True`` ``False`` | ||
| ``True ^ False`` ``True`` | ||
| ``True ^ NA`` ``NA`` | ||
| ``False ^ False`` ``False`` | ||
| ``False ^ NA`` ``NA`` | ||
| ``NA ^ NA`` ``NA`` | ||
| ================= ========= | ||
| | ||
| When an ``NA`` is present in an operation, the output value is ``NA`` only if | ||
| the result cannot be determined solely based on the other input. For example, | ||
| ``True | NA`` is ``True``, because both ``True | True`` and ``True | False`` | ||
| are ``True``. In that case, we don't actually need to consider the value | ||
| of the ``NA``. | ||
| | ||
| On the other hand, ``True & NA`` is ``NA``. The result depends on whether | ||
| the ``NA`` really is ``True`` or ``False``, since ``True & True`` is ``True``, | ||
| but ``True & False`` is ``False``, so we can't determine the output. | ||
| | ||
| | ||
| This differs from how ``np.nan`` behaves in logical operations. Pandas treated | ||
| ``np.nan`` is *always false in the output*. | ||
TomAugspurger marked this conversation as resolved. Show resolved Hide resolved | ||
| | ||
| In ``or`` | ||
| | ||
| .. ipython:: python | ||
| | ||
| pd.Series([True, False, np.nan], dtype="object") | True | ||
| pd.Series([True, False, np.nan], dtype="boolean") | True | ||
| | ||
| In ``and`` | ||
| | ||
| .. ipython:: python | ||
| | ||
| pd.Series([True, False, np.nan], dtype="object") & True | ||
| pd.Series([True, False, np.nan], dtype="boolean") & True | ||
| | ||
| .. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -184,6 +184,9 @@ class BooleanArray(ExtensionArray, ExtensionOpsMixin): | |
| represented by 2 numpy arrays: a boolean array with the data and | ||
| a boolean array with the mask (True indicating missing). | ||
| | ||
| BooleanArray implements Kleene logic (sometimes called three-value | ||
| logic) for logical operations. See :ref:`boolean.kleene` for more. | ||
| | ||
| To construct an BooleanArray from generic array-like input, use | ||
| :func:`pandas.array` specifying ``dtype="boolean"`` (see examples | ||
| below). | ||
| | @@ -283,7 +286,7 @@ def __getitem__(self, item): | |
| | ||
| def _coerce_to_ndarray(self, dtype=None, na_value: "Scalar" = libmissing.NA): | ||
| """ | ||
| Coerce to an ndarary of object dtype or bool dtype (if force_bool=True). | ||
| Coerce to an ndarray of object dtype or bool dtype (if force_bool=True). | ||
| | ||
| Parameters | ||
| ---------- | ||
| | @@ -565,33 +568,40 @@ def logical_method(self, other): | |
| # Rely on pandas to unbox and dispatch to us. | ||
| return NotImplemented | ||
| | ||
| assert op.__name__ in {"or_", "ror_", "and_", "rand_", "xor", "rxor"} | ||
| other = lib.item_from_zerodim(other) | ||
| other_is_booleanarray = isinstance(other, BooleanArray) | ||
| other_is_scalar = lib.is_scalar(other) | ||
| mask = None | ||
| | ||
| if isinstance(other, BooleanArray): | ||
| if other_is_booleanarray: | ||
| other, mask = other._data, other._mask | ||
| elif is_list_like(other): | ||
| other = np.asarray(other, dtype="bool") | ||
| if other.ndim > 1: | ||
| raise NotImplementedError( | ||
| "can only perform ops with 1-d structures" | ||
| ) | ||
| if len(self) != len(other): | ||
| raise ValueError("Lengths must match to compare") | ||
| other, mask = coerce_to_array(other, copy=False) | ||
| elif isinstance(other, np.bool_): | ||
| other = other.item() | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to convert to a python bool? why not just Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But Tom, why is it exactly needed to convert this? I would think the numpy operations later on work fine with a numpy scalar as well? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC, we do things like | ||
| | ||
| if other_is_scalar and not (other is libmissing.NA or lib.is_bool(other)): | ||
| raise TypeError( | ||
| "'other' should be pandas.NA or a bool. Got {} instead.".format( | ||
| type(other).__name__ | ||
| ) | ||
| ) | ||
| | ||
| # numpy will show a DeprecationWarning on invalid elementwise | ||
| # comparisons, this will raise in the future | ||
| with warnings.catch_warnings(): | ||
| warnings.filterwarnings("ignore", "elementwise", FutureWarning) | ||
| with np.errstate(all="ignore"): | ||
| result = op(self._data, other) | ||
| if not other_is_scalar and len(self) != len(other): | ||
| raise ValueError("Lengths must match to compare") | ||
| | ||
| # nans propagate | ||
| if mask is None: | ||
| mask = self._mask | ||
| else: | ||
| mask = self._mask | mask | ||
| if op.__name__ in {"or_", "ror_"}: | ||
| result, mask = ops.kleene_or(self._data, other, self._mask, mask) | ||
| elif op.__name__ in {"and_", "rand_"}: | ||
| result, mask = ops.kleene_and(self._data, other, self._mask, mask) | ||
| elif op.__name__ in {"xor", "rxor"}: | ||
| result, mask = ops.kleene_xor(self._data, other, self._mask, mask) | ||
| | ||
| return BooleanArray(result, mask) | ||
| | ||
| | ||
Uh oh!
There was an error while loading. Please reload this page.