-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Roll quantile support multiple quantiles as per #12093: rough sketch for discussion #33843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @cottrell sorry this hasn't seen any attention; we have a number of PRs. conceptually this would be ok. I would want to refactor out the actual quantile code to make the multiple loops in cython reasonable. Does this change the perf of the single quantile case? the return value here is also slightly tricky as now we'd have to return a frame (rather than a series), but that's prob ok (I think we do that anyhow for regular quantile). please merge master. |
| cc @mroeschke |
| Old timings are in the issue #12093 I just tried on some latest pandas and see same order for usual single quantile. I've sort of forgotten the details here but I think the most time consuming and complicated bit was really the python and passing arguments through the rolling patterns. Any cython changes are likely easy or at least transparent. I think the API itself likely requires some pandas internals knowledge and opinions. |
pandas/_libs/window/aggregations.pyx Outdated
| # Single value in skip list | ||
| output[i] = skiplist_get(skiplist, 0, &ret) | ||
| tmp = skiplist_get(skiplist, 0, &ret) | ||
| for j in range(0, Nq): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You can change these to range(Nq)
| continue | ||
| else: | ||
| raise DataError("No numeric types to aggregate") from err | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you undo this whitespacing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an autoformat I think. Probably autoformat is missing or is missing an idempotent test.
@cottrell sorry this hasn't seen any attention; we have a number of PRs.
conceptually this would be ok. I would want to refactor out the actual quantile code to make the multiple loops in cython reasonable.
Does this change the perf of the single quantile case?
the return value here is also slightly tricky as now we'd have to return a frame (rather than a series), but that's prob ok (I think we do that anyhow for regular quantile).
please merge master.
Also should this branch not have been rebased? I thought that was the pattern pandas dev was following?
| quantiles=quantile, | ||
| interpolation=interpolation, | ||
| ) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
| df.columns.name = obj.name | ||
| return df | ||
| elif ( | ||
| result.ndim == 3 and columns is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result should always be less than 2 dim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This is the whole point on this thing. I can barely remember this change but this is a rolling (time), broadcast (columns), quantile (quantiles). I might have the order wrong in my comment here.
| from pandas import Series | ||
| | ||
| return Series(result, index, name=obj.name) | ||
| elif result.ndim == 2 and columns is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be handled by return type(obj)(result, index=index, columns=block.columns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it breaks if this is not there? I would just try changing this and see ... this whole rolling.py API was a scratch. Somebody who knows the intentions should probably just either undo the rolling.py diff and quickly make the changes or unpick the stuff. There were many things where it is was not possible to really know what was intended there. Will push the other changes shortly.
| index = obj.index | ||
| | ||
| if isinstance(result, np.ndarray): | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
| columns = None | ||
| # func is a functools.partial. Let's abuse this for now. | ||
| # this getattr madness is because of awful mypy | ||
| func_name = getattr(func, "func", func).__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be handled by wrap results
| @cottrell sorry for letting this PR sit dormant for a while. Mind fixing up the merge conflict? We had a large overhaul on the rolling code recently. |
| This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
| @cottrell can you rebase (a lot has changed recently). i think this is an interesting PR |
| Thanks but going to close this PR as stale. Please ping if you'd like to continue working on this. |
Putting this in a PR for discussion. It definitely needs more work on kicking the arguments through properly in the python interface but someone who has strong opinions about that can hopefully step in and then will refactor and test etc.
The "content" is really just the cython change.