Skip to content

Conversation

@plammens
Copy link
Contributor

@plammens plammens commented Nov 3, 2020

This removes stacklevel requirements from the deprecation warnings for resample and Grouper kwargs in an attempt to simplify the warning-handling code (see #36629 for "why"). This only changes tests.resample.test_deprecated.

The second commit is optional (it just extracts the deprecated argument checking code out from Grouper.__new__ to declutter the latter); I can revert it if needed.

@plammens plammens force-pushed the remove-stacklevel-checks branch from 9f6a5d6 to c689f88 Compare November 3, 2020 15:44
@jreback jreback added the Testing pandas testing functions or related to the test suite label Nov 4, 2020
@plammens plammens marked this pull request as ready for review November 15, 2020 08:59
@jreback
Copy link
Contributor

jreback commented Nov 15, 2020

yes we should set some reasonable stack level
it will be right for the most common case

@plammens
Copy link
Contributor Author

plammens commented Nov 19, 2020

How about using the traceback module to inspect the call stack? We could extract e.g. the top 10 stack frames to be able to set an accurate stacklevel. Would this be too much overhead?

Actually this computation should only be done if the warnings do have to be raised, so relative to actually emitting the warning I'd say the overhead is minimal.

@jreback
Copy link
Contributor

jreback commented Nov 19, 2020

How about using the traceback module to inspect the call stack? We could extract e.g. the top 10 stack frames to be able to set an accurate stacklevel. Would this be too much overhead?

Actually this computation should only be done if the warnings do have to be raised, so relative to actually emitting the warning I'd say the overhead is minimal.

we don’t want the complexity

just hard code some reasonable defaults (iirc that’s how this was) and call it a day

@plammens
Copy link
Contributor Author

How about using the traceback module to inspect the call stack? We could extract e.g. the top 10 stack frames to be able to set an accurate stacklevel. Would this be too much overhead?
Actually this computation should only be done if the warnings do have to be raised, so relative to actually emitting the warning I'd say the overhead is minimal.

we don’t want the complexity

just hard code some reasonable defaults (iirc that’s how this was) and call it a day

Okay, I've reverted to how it was before (stacklevel is set to 2 or 4 depending on if called from Grouper or TimeGrouper).

@plammens plammens requested a review from jreback November 19, 2020 21:54
@jreback
Copy link
Contributor

jreback commented Nov 21, 2020

ok, going back to the original question. you want to disable the stacklevel checks because they are incorrect? or its fragile?

the entire deprecation warning will be removed in 2.0 anyhow.

@plammens
Copy link
Contributor Author

plammens commented Nov 22, 2020

ok, going back to the original question. you want to disable the stacklevel checks because they are incorrect? or its fragile?

Oh I made this PR because I thought you wanted to remove these tests 😮 #36629 (comment)

But to answer the question directly, it's because they're fragile. Or to be precise, because the way they are satisfied is fragile, and using a more robust solution is too complex for basically just a couple of deprecation warnings. (This is the discussion we had in #36629.)

@plammens plammens mentioned this pull request Nov 26, 2020
3 tasks
@jreback
Copy link
Contributor

jreback commented Nov 26, 2020

pls restore the actual stacklevels to what they were
i am ok with disabling some tests just not all of them

so iow i am not sure this PR is actually a net positive

@plammens
Copy link
Contributor Author

plammens commented Nov 28, 2020

pls restore the actual stacklevels to what they were

I've already done that! #37603 (comment)

i am ok with disabling some tests just not all of them

I didn't disable all of them, juste the ones for these specific warnings.

so iow i am not sure this PR is actually a net positive

You asked me to do exactly this in in #36629 (comment) and then again in #36369 (comment), please be clear, I'm getting very confused 😂

@plammens
Copy link
Contributor Author

Any updates on this @jreback? Did I misunderstand what you requested?

@jreback
Copy link
Contributor

jreback commented Dec 14, 2020

well the issue is this is changing what we have on master, which is passing now. can you explain again why this change is worth it?

@plammens
Copy link
Contributor Author

plammens commented Dec 15, 2020

well the issue is this is changing what we have on master, which is passing now. can you explain again why this change is worth it?

This PR removes the stacklevel requirement from resample deprecation warnings. This change is not worth it in isolation; it is only easing the way for other things to go ahead (namely, all of this started because of #36369). The initial solution was not to remove the stacklevel checks, but to implement a more robust solution for setting the correct stacklevel (#36629), but you pointed out that it was too complex to be worth it for just a few deprecation warnings. So we both agreed the simplest solution was to remove the stacklevel requirement from these specific tests.

See also:

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok merge master and ping on green

@jorisvandenbossche
Copy link
Member

So we both agreed the simplest solution was to remove the stacklevel requirement from these specific tests.

I didn't fully follow the original cause that all triggered this discussion but: yes the current checks are fragile, but they are already implemented, and are working. So why do the effort of removing them?

You mention it is because of #36369. But how does improving the docs impact this?
(and #36369 seems a really worthwhile contribution! I would rather much focus our efforts on improving the docs in that PR, than in this discussion where we seem to be talking a bit past each other...)

@plammens
Copy link
Contributor Author

plammens commented Dec 15, 2020

So we both agreed the simplest solution was to remove the stacklevel requirement from these specific tests.

I didn't fully follow the original cause that all triggered this discussion but: yes the current checks are fragile, but they are already implemented, and are working. So why do the effort of removing them?

You mention it is because of #36369. But how does improving the docs impact this?
(and #36369 seems a really worthwhile contribution! I would rather much focus our efforts on improving the docs in that PR, than in this discussion where we seem to be talking a bit past each other...)

You are right @jorisvandenbossche that perhaps things got a bit sidetracked. To make sure we're on the same page, here's a quick summary of what happened so far, in chronological order:

  1. I started working on DOC: Enhance asfreq docs #36369; I made the documentation changes and committed.
  2. When running the tests though, I noticed the stacklevel checks for resample deprecation warnings were failing. This was because in order to make an addition to the docs of each individually, I made a "dummy" override (i.e. an override that just calls the super implementation) of NDFrame.resample for Series and DataFrame, using a shared doc template. Since this results in a different stack when the resample call reaches the warning code, the stacklevel checks were failing. (Full details in DOC: Enhance asfreq docs #36369.)
  3. I saw that the way these checks were satisfied was hardcoding some magic numbers which depended on a specific call graph, which breaks easily when refactoring (as shown in this case). (Again, full details in DOC: Enhance asfreq docs #36369 and CLN: Improve setting of resample deprecation warnings' stacklevel #36629.)
  4. Instead of piling on some more magic numbers, I decided it would have been better to implement a more robust solution for setting the appropriate stacklevel. Initially, I added this directly to DOC: Enhance asfreq docs #36369, but then I was asked to extract this into a separate PR as it was getting out of scope; the result was CLN: Improve setting of resample deprecation warnings' stacklevel #36629.
  5. After some discussion with Jeff, it seemed the solution I implemented was too complex for its purpose, and so we determined that the simplest solution would be to just turn off the stacklevel checks for these deprecation warnings (which are going to be removed entirely in 2.0 in any case).

So going back to #36369: while I could ditch the changes I made for Series.resample and DataFrame.resample and just use a single docstring in NDFrame.resample (which would imply using a lot of sentences of the form "if self is a Series, then ..., otherwise, if self is a DataFrame, ..."), it seems a pity to me that we would essentially be blocking an improvement to the documentation simply because of some stacklevel checks and the hacky implementation that satisfies them...

In summary, I would either merge the fix #36629 (or perhaps a simpler alternative) or remove the stacklevel checks (this PR), rather than scrap the documentation changes.

@plammens plammens force-pushed the remove-stacklevel-checks branch from e842308 to 055f01d Compare December 15, 2020 15:00
@jorisvandenbossche
Copy link
Member

Thanks for the overview :)

  1. When running the tests though, I noticed the stacklevel checks for resample deprecation warnings were failing. This was because in order to make an addition to the docs of each individually, I made a "dummy" override (i.e. an override that just calls the super implementation) of NDFrame.resample for Series and DataFrame, using a shared doc template. Since this results in a different stack when the resample call reaches the warning code, the stacklevel checks were failing. (Full details in DOC: Enhance asfreq docs #36369.)

So by adding those Series/DataFrame methods (which I think is certainly a good change to ensure proper docs for both), this increased the stacklevel by 1, causing the test failure. But so we could also simply increase the stacklevel set for the warning by 1 and be done with it? (or is it not that simple .. ?)

(and yes, this "magic" number easily breaks when refactoring, as it happened for you. But that's what the tests are for, and thus after refactoring, we can then update this "magic" number, and the tests pass again. And IMO it's really not that magic, everywhere we issue a warning in the codebase, we need to set a stacklevel. The only complicating issue here is that multiple code paths could reach the same warning with a different stacklevel, in which case we could also choose one.
And yes, this way it is quite brittle, but I think updating the stacklevel manually takes less time than all those discussions ...)

In summary, I would either merge the fix #36629 (or perhaps a simpler alternative) or remove the stacklevel checks (this PR), rather than scrap the documentation changes.

Or, a fourth alternative: keep the documentation improvement, and update the magic number?

@jreback
Copy link
Contributor

jreback commented Dec 16, 2020

yeah like @jorisvandenbossche 4th suggestion here. let's try to fix the docs and update to make things pass. if there is an unavoidable situation, then yes remove the stack level check. let's just do it in that other PR.

@plammens
Copy link
Contributor Author

plammens commented Dec 17, 2020

But so we could also simply increase the stacklevel set for the warning by 1 and be done with it? (or is it not that simple .. ?)

Precisely, it's not that simple: DataFrame.resample and Series.resample should have stacklevel 5 now, but DataFrameGroupBy.resample still has to have stacklevel 4. Before, it just happened to be that both NDFrame.resample and DataFrameGroupBy.resample needed a stacklevel of 4, and they both arrived to the warning code through TimeGrouper. (It amuses me how the more I go into this, the more hacky this solution seems to be :D ) Also, for completeness, running something like NDFrame.resample(df) would also have the wrong stacklevel since we now increased it by 1.

You can reproduce this by checking out 27b16cf, modifying this

stacklevel = 4 if cls is TimeGrouper else 2

to stacklevel = 5 if cls is TimeGrouper else 2 and run the tests in pandas.tests.resample.test_deprecated. All tests will pass except this:

with tm.assert_produces_warning(FutureWarning):
df.groupby("a").resample("3T", base=0).sum()
with tm.assert_produces_warning(FutureWarning):
df.groupby("a").resample("3T", loffset="0s").sum()

(and yes, this "magic" number easily breaks when refactoring, as it happened for you. But that's what the tests are for, and thus after refactoring, we can then update this "magic" number, and the tests pass again. And IMO it's really not that magic, everywhere we issue a warning in the codebase, we need to set a stacklevel. The only complicating issue here is that multiple code paths could reach the same warning with a different stacklevel, in which case we could also choose one.

It's not "magic" in the sense of "arbitrary", but it is in the sense that it is strongly reliant on the implementation of many other things: in order to be able to set a correct stacklevel, one must determine the stacklevel for every single path that might reach the warning code—and not only that, one must be able to compute it at the point the warning is called.

It just happened to be that, at the time that code was written, determining the correct stacklevel was relatively hassle-free. But in many cases, as in the example above, this won't be that trivial, and the only reliable way will be to actually inspect the stack (which seems a bit overkill for just some warnings).

And yes, this way it is quite brittle, but I think updating the stacklevel manually takes less time than all those discussions ...)

Well, I would have thought it's better to spend the time discussing it now and resolving it for a long time rather than having to revisit and re-fix this every time someone refactors anything related to the resample code... :)

But of course, the choice is yours; you're more experienced and perhaps I'm worrying about nothing.

Or, a fourth alternative: keep the documentation improvement, and update the magic number?

The only way I'd see this as an alternative is if we also remove the .groupby().resample checks shown above, as these won't pass if we just increase the number by 1.

@jorisvandenbossche
Copy link
Member

Precisely, it's not that simple: DataFrame.resample and Series.resample should have stacklevel 5 now, but DataFrameGroupBy.resample still has to have stacklevel 4. Before, it just happened to be that both NDFrame.resample and DataFrameGroupBy.resample needed a stacklevel of 4, and they both arrived to the warning code through TimeGrouper. (

Then we should simply choose the one that we find most important to have the correct stacklevel. Which in this case is clearly DataFrame/Series.resample, I think.

It amuses me how the more I go into this, the more hacky this solution seems to be :D ) Also, for completeness, running something like NDFrame.resample(df) would also have the wrong stacklevel since we now increased it by 1.

Luckily nobody does that ;) (and we also shouldn't care if someone did, it's not public)

Well, I would have thought it's better to spend the time discussing it now and resolving it for a long time rather than having to revisit and re-fix this every time someone refactors anything related to the resample code... :)

But of course, the choice is yours; you're more experienced and perhaps I'm worrying about nothing.

The main thing I care about (and the reason I am involving myself here), is that we set correct stacklevels as much as reasonably possible, because IMO it's of big importance for usability when dealing with such warnings.
Ideally, python would just have known it itself what stacklevel to choose, but unfortunately, that's not the case ;) (well, technically, one could write some python code to infer it from the frames, but might also get too complex/brittle on its own)

The only way I'd see this as an alternative is if we also remove the .groupby().resample checks shown above, as these won't pass if we just increase the number by 1.

Yes, those case then be removed.

@plammens
Copy link
Contributor Author

Then we should simply choose the one that we find most important to have the correct stacklevel. Which in this case is clearly DataFrame/Series.resample, I think.

Yes, those case then be removed.

Okay, I will go for this in the original PR then. Thanks for all your patience!

@plammens plammens closed this Dec 18, 2020
plammens added a commit to plammens/pandas that referenced this pull request Dec 18, 2020
See pandas-dev#37603 for rationale regarding this solution and the removal of some of the warning tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Testing pandas testing functions or related to the test suite

3 participants