Skip to content

Conversation

@cottrell
Copy link
Contributor

Putting this in a PR for discussion. It definitely needs more work on kicking the arguments through properly in the python interface but someone who has strong opinions about that can hopefully step in and then will refactor and test etc.

The "content" is really just the cython change.

@pep8speaks
Copy link

pep8speaks commented Apr 28, 2020

Hello @cottrell! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-07-01 14:41:44 UTC
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

@cottrell sorry this hasn't seen any attention; we have a number of PRs.

conceptually this would be ok. I would want to refactor out the actual quantile code to make the multiple loops in cython reasonable.

Does this change the perf of the single quantile case?

the return value here is also slightly tricky as now we'd have to return a frame (rather than a series), but that's prob ok (I think we do that anyhow for regular quantile).

please merge master.

@jreback jreback added the Window rolling, ewma, expanding label Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

@cottrell
Copy link
Contributor Author

@jreback

Old timings are in the issue #12093

I just tried on some latest pandas and see same order for usual single quantile.

I've sort of forgotten the details here but I think the most time consuming and complicated bit was really the python and passing arguments through the rolling patterns. Any cython changes are likely easy or at least transparent. I think the API itself likely requires some pandas internals knowledge and opinions.

In [1]: s = pd.Series(np.random.randn(100000)) In [2]: n = 10 In [3]: q = np.linspace(1./n, 1 - 1./n, n - 1) In [4]: r = s.rolling(window=100) In [5]: %time a = pd.DataFrame({qq: r.quantile(qq) for qq in q}) CPU times: user 343 ms, sys: 21 ms, total: 364 ms Wall time: 372 ms In [6]: pd.__version__ Out[6]: '1.0.5' 
# Single value in skip list
output[i] = skiplist_get(skiplist, 0, &ret)
tmp = skiplist_get(skiplist, 0, &ret)
for j in range(0, Nq):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You can change these to range(Nq)

continue
else:
raise DataError("No numeric types to aggregate") from err

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you undo this whitespacing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an autoformat I think. Probably autoformat is missing or is missing an idempotent test.

@cottrell sorry this hasn't seen any attention; we have a number of PRs.

conceptually this would be ok. I would want to refactor out the actual quantile code to make the multiple loops in cython reasonable.

Does this change the perf of the single quantile case?

the return value here is also slightly tricky as now we'd have to return a frame (rather than a series), but that's prob ok (I think we do that anyhow for regular quantile).

please merge master.

Also should this branch not have been rebased? I thought that was the pattern pandas dev was following?

quantiles=quantile,
interpolation=interpolation,
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

df.columns.name = obj.name
return df
elif (
result.ndim == 3 and columns is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result should always be less than 2 dim

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is the whole point on this thing. I can barely remember this change but this is a rolling (time), broadcast (columns), quantile (quantiles). I might have the order wrong in my comment here.

from pandas import Series

return Series(result, index, name=obj.name)
elif result.ndim == 2 and columns is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be handled by return type(obj)(result, index=index, columns=block.columns)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it breaks if this is not there? I would just try changing this and see ... this whole rolling.py API was a scratch. Somebody who knows the intentions should probably just either undo the rolling.py diff and quickly make the changes or unpick the stuff. There were many things where it is was not possible to really know what was intended there. Will push the other changes shortly.

index = obj.index

if isinstance(result, np.ndarray):

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

columns = None
# func is a functools.partial. Let's abuse this for now.
# this getattr madness is because of awful mypy
func_name = getattr(func, "func", func).__name__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be handled by wrap results

@mroeschke
Copy link
Member

@cottrell sorry for letting this PR sit dormant for a while. Mind fixing up the merge conflict? We had a large overhaul on the rolling code recently.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Oct 13, 2020
@jreback
Copy link
Contributor

jreback commented Oct 16, 2020

@cottrell can you rebase (a lot has changed recently). i think this is an interesting PR

@mroeschke
Copy link
Member

Thanks but going to close this PR as stale. Please ping if you'd like to continue working on this.

@mroeschke mroeschke closed this Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale Window rolling, ewma, expanding

4 participants