Skip to content
Prev Previous commit
Next Next commit
Refactor to avoid try: except:
  • Loading branch information
NumberPiOso committed May 23, 2022
commit f4a43e561adfadf6f8c8297a1e93111e2c459bc9
18 changes: 8 additions & 10 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4134,21 +4134,19 @@ def query(self, expr: str, inplace: bool = False, **kwargs):
kwargs["level"] = kwargs.pop("level", 0) + 1
kwargs["target"] = None
res = self.eval(expr, **kwargs)

if res.ndim == 1:
is_res_unidimensional = res.ndim == 1
if is_res_unidimensional:
is_bool_result = is_bool_dtype(res)
elif res.ndim > 1:
res_accesser = self.loc
else:
is_bool_result = all(is_bool_dtype(x) for x in res.dtypes)
Copy link
Contributor Author

@NumberPiOso NumberPiOso Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure if this generalizes to higher dimensions or I can fix the condition to res.ndim == 2 for improved readibility

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only offer up to 2 dimensional objects in pandas so should be fine, though there also might be a more explicit way with typing to branch here. @simonjayhawkins

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need any of list, you can call common.is_bool_indexer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
It correctly raises when they are not bool. But we go back to problems with tests
pandas/tests/frame/test_query_eval.py::TestDataFrameQueryNumExprPandas::test_nested_scope
pandas/tests/frame/test_query_eval.py::TestDataFrameQueryPythonPandas::test_nested_scope

The first one produces an output like the following res variable.

import pandas as pd from pandas.core.dtypes.common import is_bool_dtype from pandas.core import common as com res = pd.DataFrame([ [False, False, False], [False, True, False], [False, False, False], [False, False, False], [False, False, False], ]) if res.ndim == 1: is_bool_result = is_bool_dtype(res) elif res.ndim > 1: is_bool_result = all(is_bool_dtype(x) for x in res.dtypes) new_is_bool_result = com.is_bool_indexer(res) # is_bool_result: True # new_is_bool: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok when how about we use it then handle those cases. I really do not like re-inventing the wheel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only cases is_bool_indexer handles are Series, ndarray, Index, and lists. I don't see anything in dtypes or common that handles DataFrames. I'm not sure if adding DataFrame to is_bool_indexer with its current uses is safe. Perhaps we could add is_bool_frame? The implementation above appears to be correct to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the new implementation that takes into account #46862 (comment) I feel we need this if/else.

In general, it seems logical to me that these cases are separated.

But if you do not agree, I want to understand a little more the proposed modification. Is this the implementation you imagined?:

 if not is_bool_frame(res): msg = f"expr must evaluate to boolean not {res.dtypes}" raise ValueError(msg) try: result = self.loc[res] except ValueError: # when res is multi-dimensional loc raises, but this is sometimes a # valid query result = self[res]
# when res is multi-dimensional loc raises, but this is sometimes a
# valid query
res_accesser = self
if not is_bool_result:
msg = f"expr must evaluate to boolean not {res.dtypes}"
raise ValueError(msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, as not looked at this in detail, but if we are ensuring a bool indexer, is the following try/except still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is a common problem with the multidimensional res, so we can share the solution.

try:
result = self.loc[res]
except ValueError:
# when res is multi-dimensional loc raises, but this is sometimes a
# valid query
result = self[res]

result = res_accesser[res]
if inplace:
self._update_inplace(result)
return None
Expand Down