BUG: Raise error when expr does not evaluate to bool in df.query #46862

NumberPiOso · 2022-04-25T01:34:28Z

closes df.query() doesn't follow python truthy-ness? #8560
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/v1.5.0.rst file if fixing a bug or adding a new feature.

NumberPiOso · 2022-04-25T01:58:54Z

Special attention to line 4139 in

Lines 4137 to 4141 in b8e1aa4

     if not is_bool_dtype(res):  
   # Special condition to check when dealing with higher dimensions  
   if not (res.ndim > 1 and (res.dtypes == bool).all()):  
   msg = f"expr must evaluate to boolean not {res.dtypes}"  
   raise ValueError(msg)  
 

This condition was added in order to pass some special test that used higher dimensions. These tests were

pandas/tests/frame/test_query_eval.py::TestDataFrameQueryNumExprPandas::test_nested_scope
pandas/tests/frame/test_query_eval.py::TestDataFrameQueryPythonPandas::test_nested_scope

This is because these tests produce a res that evaluates to false in first condition (line 4137)

res = 0 1 2 0 True False False 1 False False False 2 False False True 3 True True False 4 False True True

So in this case we check for column to be bool type

pandas/tests/frame/test_query_eval.py

pandas/core/frame.py

pandas/tests/frame/test_query_eval.py

NumberPiOso · 2022-04-26T21:51:34Z

pandas/core/frame.py

+ if res.ndim == 1:
+ is_bool_result = is_bool_dtype(res)
+ elif res.ndim > 1:
+ is_bool_result = all(is_bool_dtype(x) for x in res.dtypes)


I am not completely sure if this generalizes to higher dimensions or I can fix the condition to res.ndim == 2 for improved readibility

We only offer up to 2 dimensional objects in pandas so should be fine, though there also might be a more explicit way with typing to branch here. @simonjayhawkins

you don't need any of list, you can call common.is_bool_indexer

@jreback
It correctly raises when they are not bool. But we go back to problems with tests
pandas/tests/frame/test_query_eval.py::TestDataFrameQueryNumExprPandas::test_nested_scope
pandas/tests/frame/test_query_eval.py::TestDataFrameQueryPythonPandas::test_nested_scope

The first one produces an output like the following res variable.

import pandas as pd from pandas.core.dtypes.common import is_bool_dtype from pandas.core import common as com res = pd.DataFrame([ [False, False, False], [False, True, False], [False, False, False], [False, False, False], [False, False, False], ]) if res.ndim == 1: is_bool_result = is_bool_dtype(res) elif res.ndim > 1: is_bool_result = all(is_bool_dtype(x) for x in res.dtypes) new_is_bool_result = com.is_bool_indexer(res) # is_bool_result: True # new_is_bool: False

ok when how about we use it then handle those cases. I really do not like re-inventing the wheel.

The only cases is_bool_indexer handles are Series, ndarray, Index, and lists. I don't see anything in dtypes or common that handles DataFrames. I'm not sure if adding DataFrame to is_bool_indexer with its current uses is safe. Perhaps we could add is_bool_frame? The implementation above appears to be correct to me.

With the new implementation that takes into account #46862 (comment) I feel we need this if/else.

In general, it seems logical to me that these cases are separated.

But if you do not agree, I want to understand a little more the proposed modification. Is this the implementation you imagined?:

if not is_bool_frame(res): msg = f"expr must evaluate to boolean not {res.dtypes}" raise ValueError(msg) try: result = self.loc[res] except ValueError: # when res is multi-dimensional loc raises, but this is sometimes a # valid query result = self[res]

jreback · 2022-04-27T12:43:22Z

pandas/core/frame.py

+ if res.ndim == 1:
+ is_bool_result = is_bool_dtype(res)
+ elif res.ndim > 1:
+ is_bool_result = all(is_bool_dtype(x) for x in res.dtypes)


you don't need any of list, you can call common.is_bool_indexer

simonjayhawkins · 2022-05-04T14:37:44Z

pandas/core/frame.py

+ is_bool_result = all(is_bool_dtype(x) for x in res.dtypes)
+ if not is_bool_result:
+ msg = f"expr must evaluate to boolean not {res.dtypes}"
+ raise ValueError(msg)


out of curiosity, as not looked at this in detail, but if we are ensuring a bool indexer, is the following try/except still needed?

Indeed it is a common problem with the multidimensional res, so we can share the solution.

NumberPiOso · 2022-05-23T00:55:14Z

Sorry for the long delay, I had a couple of hard weeks. But now on I can and will iterate a lot quicker.

github-actions · 2022-07-11T00:09:01Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-07-22T17:48:39Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

NumberPiOso added 4 commits April 24, 2022 20:50

Add failing test to df.query

cca41f3

Report error when non boolean dtype df.query

4e99d29

Special case when res.dim > 1

fffdcc8

Add whatsnew entry

b8e1aa4

NumberPiOso force-pushed the query-error branch from daf9245 to b8e1aa4 Compare April 25, 2022 01:50

gurashish1singh reviewed Apr 25, 2022

View reviewed changes

pandas/tests/frame/test_query_eval.py Outdated Show resolved Hide resolved

Fix boolean typo

3ae076c

rhshadrach requested changes Apr 25, 2022

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

pandas/tests/frame/test_query_eval.py Show resolved Hide resolved

rhshadrach added Bug expressions pd.eval, query labels Apr 25, 2022

NumberPiOso added 2 commits April 26, 2022 16:47

Refactor checking boolean df.query

b0224a7

Add raising error test ndim=2 in df.query

2ba87ed

NumberPiOso commented Apr 26, 2022

View reviewed changes

NumberPiOso requested a review from rhshadrach April 26, 2022 21:51

jreback requested changes Apr 27, 2022

View reviewed changes

NumberPiOso requested a review from jreback April 28, 2022 15:00

simonjayhawkins reviewed May 4, 2022

View reviewed changes

Refactor to avoid try: except:

f4a43e5

NumberPiOso requested a review from simonjayhawkins May 23, 2022 01:20

Merge branch 'main' into query-error

fddf50a

github-actions bot added the Stale label Jul 11, 2022

mroeschke closed this Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Raise error when expr does not evaluate to bool in df.query #46862

BUG: Raise error when expr does not evaluate to bool in df.query #46862

Uh oh!

NumberPiOso commented Apr 25, 2022 •

edited

Loading

NumberPiOso commented Apr 25, 2022

Uh oh!

Uh oh!

Uh oh!

NumberPiOso Apr 26, 2022 •

edited

Loading

WillAyd Apr 26, 2022

jreback Apr 27, 2022

NumberPiOso Apr 27, 2022

jreback May 8, 2022

rhshadrach May 11, 2022

NumberPiOso May 23, 2022

jreback Apr 27, 2022

simonjayhawkins May 4, 2022

NumberPiOso May 23, 2022

NumberPiOso commented May 23, 2022

github-actions bot commented Jul 11, 2022

mroeschke commented Jul 22, 2022

Labels

7 participants

Uh oh!

BUG: Raise error when expr does not evaluate to bool in df.query #46862

BUG: Raise error when expr does not evaluate to bool in df.query #46862

Uh oh!

Conversation

NumberPiOso commented Apr 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

NumberPiOso commented Apr 25, 2022

Uh oh!

Uh oh!

Uh oh!

NumberPiOso Apr 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NumberPiOso commented May 23, 2022

github-actions bot commented Jul 11, 2022

mroeschke commented Jul 22, 2022

Labels

7 participants

NumberPiOso commented Apr 25, 2022 •

edited

Loading

NumberPiOso Apr 26, 2022 •

edited

Loading