-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: partial string match in query #26027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: partial string match in query #26027
Conversation
…tring-match-in-query
Codecov Report
@@ Coverage Diff @@ ## master #26027 +/- ## =========================================== - Coverage 91.82% 40.72% -51.11% =========================================== Files 175 175 Lines 52539 52554 +15 =========================================== - Hits 48246 21404 -26842 - Misses 4293 31150 +26857
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@ ## master #26027 +/- ## ========================================== + Coverage 91.82% 91.95% +0.12% ========================================== Files 175 175 Lines 52539 52427 -112 ========================================== - Hits 48246 48211 -35 + Misses 4293 4216 -77
Continue to review full report at Codecov.
|
| partial_str_match : bool, optional, default False | ||
| If this is True, an `expr` like "string_query in list_like_of_strings" | ||
| is interpreted as partial string match (the default behavior is exact | ||
| matching). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a "versionadded" tag in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I added.
| # equality | ||
| res1 = df.query('color == "red"', parser=parser, engine=engine) | ||
| res2 = df.query('"red" == color', parser=parser, engine=engine) | ||
| res1 = df.query('color == "red"', **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just be explicit about the parameter being passed instead of using kwargs?
pandas/core/computation/eval.py Outdated
| def eval(expr, parser='pandas', engine=None, truediv=True, | ||
| local_dict=None, global_dict=None, resolvers=(), level=0, | ||
| target=None, inplace=False): | ||
| target=None, inplace=False, partial_str_match=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a type annotation for the new keyword?
pandas/core/computation/expr.py Outdated
| preparser=preparser) | ||
| def __init__(self, env, engine, parser, preparser=lambda x: x, | ||
| partial_str_match=False): | ||
| super(PythonExprVisitor, self).__init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need Py2 compat call any more so just super() should be fine
| return request.param | ||
| | ||
| | ||
| @pytest.fixture(params=[False, True]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a fixture? I think it would be better served to have a generic True / False fixture instead of duplicating this for particular parameters (if one doesn't already exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your comment. I could not find such a generic boolean fixture in tests/**/*.py. How do you think about adding, say, def boolean fixture in tests/frame/common.py? Am I missing your point?
| a = np.random.choice(['red', 'green'], size=10) | ||
| b = np.random.choice(['eggs', 'ham'], size=10) | ||
| a = np.random.choice(['red', 'a_red', 'a_red_a', | ||
| 'red_a', 'Red', 'green'], size=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why were the sizes changed?
| res1 = df.query('["red"] in color', **kwargs) | ||
| res2 = df.query('"red" in color', **kwargs) | ||
| exp1 = df[ind.isin(['red'])] | ||
| if partial_str_match: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is pretty huge already. I think it would be better to create a separate test for this
…np2/pandas into feature/partial-string-match-in-query
| I am -1 on this idea generally. This is very easily implemented using the |
git diff upstream/master -u -- "*.py" | flake8 --diffThis PR proposes to add partial string matching functionality to
querymethod. In the proposed implementation, a query likedf.query('"alice" in person_name', partial_str_match=True)returns rows with 'person_name' containing 'alice' in them. For other kind of queries (e.g.'["alice", "bob"] in person_name'or'age == 20') withpartial_str_match=Trueor those withpartial_str_match=False(which is set to default for backward compatibility), results as before are returned.