ENH: partial string match in query #26027

dlnp2 · 2019-04-08T14:18:36Z

closes na (once proposed in Support for partial string matching in query #8749 but not implemented so far)
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This PR proposes to add partial string matching functionality to query method. In the proposed implementation, a query like df.query('"alice" in person_name', partial_str_match=True) returns rows with 'person_name' containing 'alice' in them. For other kind of queries (e.g. '["alice", "bob"] in person_name' or 'age == 20') with partial_str_match=True or those with partial_str_match=False (which is set to default for backward compatibility), results as before are returned.

…tring-match-in-query

codecov · 2019-04-08T16:49:09Z

Codecov Report

Merging #26027 into master will decrease coverage by 51.1%.
The diff coverage is 34.78%.

@@ Coverage Diff @@ ## master #26027 +/- ## =========================================== - Coverage 91.82% 40.72% -51.11%  =========================================== Files 175 175 Lines 52539 52554 +15 =========================================== - Hits 48246 21404 -26842  - Misses 4293 31150 +26857

Flag	Coverage Δ
#multiple	`?`
#single	`40.72% <34.78%> (-0.14%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/computation/eval.py	`13.59% <ø> (-83.5%)`	⬇️
pandas/core/computation/expr.py	`61.63% <34.78%> (-26.93%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.37%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.16%)`	⬇️
pandas/io/sas/sas_xport.py	`0% <0%> (-90.1%)`	⬇️
... and 132 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de8133...943615c. Read the comment docs.

codecov · 2019-04-08T16:49:12Z

Codecov Report

Merging #26027 into master will increase coverage by 0.12%.
The diff coverage is 100%.

@@ Coverage Diff @@ ## master #26027 +/- ## ========================================== + Coverage 91.82% 91.95% +0.12%  ========================================== Files 175 175 Lines 52539 52427 -112 ========================================== - Hits 48246 48211 -35  + Misses 4293 4216 -77

Flag	Coverage Δ
#multiple	`90.51% <100%> (+0.13%)`	⬆️
#single	`40.72% <58.33%> (-0.14%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/computation/eval.py	`97.08% <ø> (ø)`	⬆️
pandas/core/computation/expr.py	`96.83% <100%> (+8.27%)`	⬆️
pandas/io/gbq.py	`75% <0%> (-12.5%)`	⬇️
pandas/util/_doctools.py	`12% <0%> (-0.88%)`	⬇️
pandas/core/sparse/frame.py	`95.49% <0%> (-0.21%)`	⬇️
pandas/plotting/_core.py	`83.76% <0%> (-0.09%)`	⬇️
pandas/io/common.py	`91.83% <0%> (-0.05%)`	⬇️
pandas/core/computation/ops.py	`95.62% <0%> (-0.04%)`	⬇️
pandas/core/computation/align.py	`97.8% <0%> (-0.03%)`	⬇️
pandas/core/reshape/melt.py	`97.47% <0%> (-0.03%)`	⬇️
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de8133...b120f59. Read the comment docs.

gfyoung · 2019-04-08T18:35:09Z

pandas/core/computation/eval.py

+ partial_str_match : bool, optional, default False
+ If this is True, an `expr` like "string_query in list_like_of_strings"
+ is interpreted as partial string match (the default behavior is exact
+ matching).


Add a "versionadded" tag in the docstring.

Thank you, I added.

WillAyd · 2019-04-09T17:23:37Z

pandas/tests/frame/test_query_eval.py

 # equality
- res1 = df.query('color == "red"', parser=parser, engine=engine)
- res2 = df.query('"red" == color', parser=parser, engine=engine)
+ res1 = df.query('color == "red"', **kwargs)


Can you just be explicit about the parameter being passed instead of using kwargs?

WillAyd · 2019-04-09T17:23:49Z

pandas/core/computation/eval.py

 def eval(expr, parser='pandas', engine=None, truediv=True,
 local_dict=None, global_dict=None, resolvers=(), level=0,
- target=None, inplace=False):
+ target=None, inplace=False, partial_str_match=False):


Can you add a type annotation for the new keyword?

WillAyd · 2019-04-09T17:25:05Z

pandas/core/computation/expr.py

- preparser=preparser)
+ def __init__(self, env, engine, parser, preparser=lambda x: x,
+ partial_str_match=False):
+ super(PythonExprVisitor, self).__init__(


Don't need Py2 compat call any more so just super() should be fine

WillAyd · 2019-04-09T17:26:00Z

pandas/tests/frame/test_query_eval.py

 return request.param


+@pytest.fixture(params=[False, True])


Does this need to be a fixture? I think it would be better served to have a generic True / False fixture instead of duplicating this for particular parameters (if one doesn't already exist)

@WillAyd

Thank you for your comment. I could not find such a generic boolean fixture in tests/**/*.py. How do you think about adding, say, def boolean fixture in tests/frame/common.py? Am I missing your point?

WillAyd · 2019-04-09T17:26:26Z

pandas/tests/frame/test_query_eval.py

- a = np.random.choice(['red', 'green'], size=10)
- b = np.random.choice(['eggs', 'ham'], size=10)
+ a = np.random.choice(['red', 'a_red', 'a_red_a',
+ 'red_a', 'Red', 'green'], size=30)


Why were the sizes changed?

WillAyd · 2019-04-09T17:27:03Z

pandas/tests/frame/test_query_eval.py

+ res1 = df.query('["red"] in color', **kwargs)
+ res2 = df.query('"red" in color', **kwargs)
+ exp1 = df[ind.isin(['red'])]
+ if partial_str_match:


This test is pretty huge already. I think it would be better to create a separate test for this

…np2/pandas into feature/partial-string-match-in-query

jreback · 2019-06-08T20:23:34Z

I am -1 on this idea generally. This is very easily implemented using the .str methods and .query doesn't need even more magic (note there is NO benefit to using string methods in .query anyhow as numexpr is not used).

dlnp2 added 8 commits April 5, 2019 02:23

ENH: add partial string matching to query

472bc48

restrict ops to In or NotIn

a81f374

add an option for backward compatibility

1cb9d47

add test for string query with multi index

085eab2

add test cases for query string

8d226b1

set partial match parameter as an instance variable

4dd0e66

Merge remote-tracking branch 'upstream/master' into feature/partial-s…

ff8d5d6

…tring-match-in-query

remove str compat type

943615c

gfyoung added API Design Strings String extension data type and string data labels Apr 8, 2019

gfyoung reviewed Apr 8, 2019

View reviewed changes

gfyoung requested a review from jreback April 8, 2019 18:35

add versionadded tag

6e535ba

WillAyd requested changes Apr 9, 2019

View reviewed changes

dlnp2 added 4 commits April 15, 2019 19:03

add type annotation

b46fcdb

use py3 super

6463780

explicitly describe parameters instead of kwargs

86ca932

Merge branch 'feature/partial-string-match-in-query' of github.com:dl…

b120f59

…np2/pandas into feature/partial-string-match-in-query

jreback closed this Jun 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: partial string match in query #26027

ENH: partial string match in query #26027

Uh oh!

dlnp2 commented Apr 8, 2019 •

edited

Loading

codecov bot commented Apr 8, 2019

codecov bot commented Apr 8, 2019 •

edited

Loading

gfyoung Apr 8, 2019

dlnp2 Apr 9, 2019

WillAyd Apr 9, 2019

WillAyd Apr 9, 2019

WillAyd Apr 9, 2019

WillAyd Apr 9, 2019

dlnp2 Apr 15, 2019 •

edited

Loading

WillAyd Apr 9, 2019

WillAyd Apr 9, 2019

jreback commented Jun 8, 2019

Labels

4 participants

Uh oh!

ENH: partial string match in query #26027

ENH: partial string match in query #26027

Uh oh!

Conversation

dlnp2 commented Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

codecov bot commented Apr 8, 2019

Codecov Report

codecov bot commented Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dlnp2 Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 8, 2019

Labels

4 participants

dlnp2 commented Apr 8, 2019 •

edited

Loading

codecov bot commented Apr 8, 2019 •

edited

Loading

dlnp2 Apr 15, 2019 •

edited

Loading