-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
DOC: improve groupby.DataFrameGroupBy.resample docstring #20020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: improve groupby.DataFrameGroupBy.resample docstring #20020
Conversation
| Hello @pandres! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 09, 2018 at 16:01 Hours UTC |
pandas/core/groupby.py Outdated
| Downsample the series into 3 minute bins and sum the values | ||
| of the timestamps falling into a bin. | ||
| >>> series.resample('3T').sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is resample on a groupby object, so the examples should also show that. See here for some explanation of this: https://pandas.pydata.org/pandas-docs/stable/groupby.html#new-syntax-to-window-and-resample-operations
pandas/core/groupby.py Outdated
| value in the resampled bucket with the label ``2000-01-01 00:03:00`` | ||
| does not include 3 (if it did, the summed value would be 6, not 3). | ||
| To include this value close the right side of the bin interval as | ||
| illustrated in the example below this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General feedback: best to try to keep this text shorter.
And related to my comment above: I don't think you need to explain the details here of the resample method (for that you can refer to DataFrame.resample and include a "See Also" section), but rather how it is used together with groupby.
| Ah, OK, I see you actually copied the examples from the Series/DataFrame.resample docstring, that's why they were off topic :) |
| @jorisvandenbossche, thank you for the feedback. Maybe I rushed here, we were just doing a warm-up for the sprint on Saturday. We do have questions though, related with the |
| If something is unclear, I think it is best to ask on the gitter channel |
Codecov Report
@@ Coverage Diff @@ ## master #20020 +/- ## ========================================== + Coverage 91.67% 91.67% +<.01% ========================================== Files 150 150 Lines 49096 49096 ========================================== + Hits 45007 45009 +2 + Misses 4089 4087 -2
Continue to review full report at Codecov.
|
| Not sure what's going on with the circleci failures. |
datapythonista left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments, but please:
- Make sure you read the documentation (it explains the next points there, among other things)
- Make sure somebody else (not the author) understands what the method does, and find this page useful. The (local) reviewer should try to imagine that she/he is a pandas user who wants to use this method. If this page doesn't help, is confusing... Fix it until it's "perfect".
- Run the
validate_docstrings.pyscript, and make sure you don't have any of the obvious mistake the script detects.
| """Provide resampling when using a TimeGrouper. | ||
| Return a new grouper with our resampler appended | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is the original description, but the goal of the sprint is to make it better. After reading these texts, it's still difficult for me to understand what this method is about. In which cases it'll be useful for me.
| def resample(self, rule, *args, **kwargs): | ||
| """ | ||
| Provide resampling when using a TimeGrouper | ||
| """Provide resampling when using a TimeGrouper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The short summary goes to the next line. Did you run the ./validate_docstrings.py script with it? It should let you detect formatting errors like this, and save time from reviewers, on things that a script can tell you.
| -------- | ||
| Start by creating a DataFrame with 9 one minute timestamps. | ||
| >>> index = pd.date_range('1/1/2000', periods=9, freq='T') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that we leave a blank line between the text and the code. Does it render all right in the html?
The convention is to use idx instead of index.
| 2000-01-01 00:07:00 0 1 2 3 | ||
| 2000-01-01 00:08:00 0 1 2 3 | ||
| >>> series = pd.Series(range(9), index=index) # delete this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is to use s instead of series, you can check these in the documentation guide. # delete this is because you forgot to delete this? it doesn't seem to add value this line.
| Parameters | ||
| ---------- | ||
| rule : str | ||
| The offset string or object representing target conversion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit cryptic. I think we can find a better explanation. And I'm sure there is somewhere in the pandas documentation where it explains which are the different values, we could link it. Also we could provide an example.
| function. | ||
| **kwargs | ||
| These parameters will be passed to the get_resampler_for_grouping | ||
| function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to do some research on what get_resampler_for_grouping does with them, which can be used, and what's the impact on this method.
After asking for feedback, this is the very minimal docstring that we interpret that would be acceptable.
This is in order to unlock the full list of functions we have assigned for the pandas documentation sprint.
If there is something to be improved here please tell us. Also we may make a new push of this documentation including the full scope of the args and kwargs parameters and more examples.