Skip to content

Conversation

@dlnp2
Copy link

@dlnp2 dlnp2 commented Apr 8, 2019

This PR proposes to add partial string matching functionality to query method. In the proposed implementation, a query like df.query('"alice" in person_name', partial_str_match=True) returns rows with 'person_name' containing 'alice' in them. For other kind of queries (e.g. '["alice", "bob"] in person_name' or 'age == 20') with partial_str_match=True or those with partial_str_match=False (which is set to default for backward compatibility), results as before are returned.

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #26027 into master will decrease coverage by 51.1%.
The diff coverage is 34.78%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #26027 +/- ## =========================================== - Coverage 91.82% 40.72% -51.11%  =========================================== Files 175 175 Lines 52539 52554 +15 =========================================== - Hits 48246 21404 -26842  - Misses 4293 31150 +26857
Flag Coverage Δ
#multiple ?
#single 40.72% <34.78%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/computation/eval.py 13.59% <ø> (-83.5%) ⬇️
pandas/core/computation/expr.py 61.63% <34.78%> (-26.93%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.37%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.16%) ⬇️
pandas/io/sas/sas_xport.py 0% <0%> (-90.1%) ⬇️
... and 132 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de8133...943615c. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #26027 into master will increase coverage by 0.12%.
The diff coverage is 100%.

Impacted file tree graph

@@ Coverage Diff @@ ## master #26027 +/- ## ========================================== + Coverage 91.82% 91.95% +0.12%  ========================================== Files 175 175 Lines 52539 52427 -112 ========================================== - Hits 48246 48211 -35  + Misses 4293 4216 -77
Flag Coverage Δ
#multiple 90.51% <100%> (+0.13%) ⬆️
#single 40.72% <58.33%> (-0.14%) ⬇️
Impacted Files Coverage Δ
pandas/core/computation/eval.py 97.08% <ø> (ø) ⬆️
pandas/core/computation/expr.py 96.83% <100%> (+8.27%) ⬆️
pandas/io/gbq.py 75% <0%> (-12.5%) ⬇️
pandas/util/_doctools.py 12% <0%> (-0.88%) ⬇️
pandas/core/sparse/frame.py 95.49% <0%> (-0.21%) ⬇️
pandas/plotting/_core.py 83.76% <0%> (-0.09%) ⬇️
pandas/io/common.py 91.83% <0%> (-0.05%) ⬇️
pandas/core/computation/ops.py 95.62% <0%> (-0.04%) ⬇️
pandas/core/computation/align.py 97.8% <0%> (-0.03%) ⬇️
pandas/core/reshape/melt.py 97.47% <0%> (-0.03%) ⬇️
... and 47 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6de8133...b120f59. Read the comment docs.

@gfyoung gfyoung added API Design Strings String extension data type and string data labels Apr 8, 2019
partial_str_match : bool, optional, default False
If this is True, an `expr` like "string_query in list_like_of_strings"
is interpreted as partial string match (the default behavior is exact
matching).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a "versionadded" tag in the docstring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I added.

@gfyoung gfyoung requested a review from jreback April 8, 2019 18:35
# equality
res1 = df.query('color == "red"', parser=parser, engine=engine)
res2 = df.query('"red" == color', parser=parser, engine=engine)
res1 = df.query('color == "red"', **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just be explicit about the parameter being passed instead of using kwargs?

def eval(expr, parser='pandas', engine=None, truediv=True,
local_dict=None, global_dict=None, resolvers=(), level=0,
target=None, inplace=False):
target=None, inplace=False, partial_str_match=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a type annotation for the new keyword?

preparser=preparser)
def __init__(self, env, engine, parser, preparser=lambda x: x,
partial_str_match=False):
super(PythonExprVisitor, self).__init__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need Py2 compat call any more so just super() should be fine

return request.param


@pytest.fixture(params=[False, True])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a fixture? I think it would be better served to have a generic True / False fixture instead of duplicating this for particular parameters (if one doesn't already exist)

Copy link
Author

@dlnp2 dlnp2 Apr 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd

Thank you for your comment. I could not find such a generic boolean fixture in tests/**/*.py. How do you think about adding, say, def boolean fixture in tests/frame/common.py? Am I missing your point?

a = np.random.choice(['red', 'green'], size=10)
b = np.random.choice(['eggs', 'ham'], size=10)
a = np.random.choice(['red', 'a_red', 'a_red_a',
'red_a', 'Red', 'green'], size=30)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were the sizes changed?

res1 = df.query('["red"] in color', **kwargs)
res2 = df.query('"red" in color', **kwargs)
exp1 = df[ind.isin(['red'])]
if partial_str_match:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is pretty huge already. I think it would be better to create a separate test for this

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

I am -1 on this idea generally. This is very easily implemented using the .str methods and .query doesn't need even more magic (note there is NO benefit to using string methods in .query anyhow as numexpr is not used).

@jreback jreback closed this Jun 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API Design Strings String extension data type and string data

4 participants