-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: option for groupby.hist to match bins #22228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8a9ddfb 2d003b4 277035a 8e3feb7 006aea7 f488e88 5a14dec 650405d a5a287f 82c1cc9 18c2564 d52ec4b 4a42feb adccd1a File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -581,6 +581,16 @@ def wrapper(*args, **kwargs): | |
| if 'axis' not in kwargs_with_axis or \ | ||
| kwargs_with_axis['axis'] is None: | ||
| kwargs_with_axis['axis'] = self.axis | ||
| if name == 'hist' and kwargs.get('equal_bins', False): | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can just do Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. | ||
| # GH-22222 | ||
| bins = kwargs.get('bins') | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm wondering if we even need to use .get here - can we just access by index? I'm assuming that 'equal_bins' should always be there so a KeyError when it's not would be fine Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same py2/3 issue. In py3 Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do not add anything here this is obscuring already extremely fragile code Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean not in the wrapper? Or also beyond that? Is there any other place you'd suggest? | ||
| if bins is None: | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point me to where | ||
| bins = 10 # use default value used in `hist_series` | ||
| if is_scalar(bins): | ||
| # share the same numpy array for all group bins | ||
| bins = np.linspace(self.obj.min(), | ||
| self.obj.max(), bins + 1) | ||
| kwargs['bins'] = bins | ||
| | ||
| def curried_with_axis(x): | ||
| return f(x, *args, **kwargs_with_axis) | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -2443,7 +2443,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, | |
| | ||
| def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | ||
| xrot=None, ylabelsize=None, yrot=None, figsize=None, | ||
| bins=10, **kwds): | ||
| bins=10, equal_bins=False, **kwds): | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying the alternative approach here. This also takes care of py2/3 compat issue. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might have missed the compat issue but if you are using Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
| ||
| """ | ||
| Draw histogram of the input series using matplotlib | ||
| | ||
| | @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | |
| bin edges are calculated and returned. If bins is a sequence, gives | ||
| bin edges, including left edge of first bin and right edge of last | ||
| bin. In this case, bins is returned unmodified. | ||
| bins: integer, default 10 | ||
| Number of histogram bins to be used | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is repeated and seems redundant. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Repeated where? This should still stay, no? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah OK didn't see that. This is fine then | ||
| equal_bins : boolean, default False | ||
| Uses the overall maximum and minimum of the groups to set a shared | ||
| bins sequence, leading to equal bin widths for all | ||
| groups (only works if bins==None or int). | ||
| | ||
| `**kwds` : keywords | ||
| To be passed to the actual plotting function | ||
| | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -10,11 +10,52 @@ | |
| import numpy as np | ||
| | ||
| from pandas.tests.plotting.common import TestPlotBase | ||
| import pytest | ||
| | ||
| | ||
| @td.skip_if_no_mpl | ||
| class TestDataFrameGroupByPlots(TestPlotBase): | ||
| | ||
| @pytest.mark.parametrize('equal_bins, bins, expected1, expected2', ( | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need two | ||
| (True, 10, | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]), | ||
| (True, None, | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assuming the default is always 10 for | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]), | ||
| (True, [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4, 3.], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]), | ||
| (False, [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4, 3.], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]), | ||
| (False, 10, | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-1., -0.8, -0.6, -0.4, -0.2, 0., 0.2, 0.4, 0.6, 0.8]), | ||
| (False, None, | ||
| [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4], | ||
| [-1., -0.8, -0.6, -0.4, -0.2, 0., 0.2, 0.4, 0.6, 0.8]) | ||
| )) | ||
| def test_hist_bins_match(self, equal_bins, bins, expected1, expected2): | ||
| # GH-22222 | ||
| df = DataFrame(np.append(np.linspace(-3, 3, 100), | ||
| np.linspace(-1, 1, 100)), | ||
| columns=['value']) | ||
| df['group'] = [0] * 100 + [1] * 100 | ||
| g = df.groupby('group')['value'] | ||
| ax = g.hist(bins=bins, alpha=0.7, equal_bins=equal_bins)[0] | ||
| | ||
| # x-axis (leftmost) hist bar coordinates: | ||
| # group0: points[:num_bins] | ||
| # group1: points[num_bins:] | ||
| num_bins = len(expected1) | ||
| points = np.array([patch.get_bbox().get_points() | ||
| for patch in ax.patches])[:, 0, 0] | ||
| | ||
| tm.assert_almost_equal(points[:num_bins], np.array(expected1)) | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would prefer to use | ||
| tm.assert_almost_equal(points[num_bins:], np.array(expected2)) | ||
| tm.close() | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to close the figure (to avoid overwriting it during | ||
| | ||
| def test_series_groupby_plotting_nominally_works(self): | ||
| n = 10 | ||
| weight = Series(np.random.normal(166, 20, size=n)) | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually
histmethod of aSeriesGroupByobject, although that't not available in the docs:https://pandas.pydata.org/pandas-docs/stable/api.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is available in
api.rstif that suffices:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just change the reference to DataFrameGroupBy.hist in the whatsnew - should suffice