-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: option for groupby.hist to match bins #22228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
8a9ddfb 2d003b4 277035a 8e3feb7 006aea7 f488e88 5a14dec 650405d a5a287f 82c1cc9 18c2564 d52ec4b 4a42feb adccd1a File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -28,6 +28,7 @@ class providing the base-class of operations. | |
| from pandas.core.dtypes.common import ( | ||
| is_numeric_dtype, | ||
| is_scalar, | ||
| is_integer, | ||
| ensure_float) | ||
| from pandas.core.dtypes.cast import maybe_downcast_to_dtype | ||
| from pandas.core.dtypes.missing import isna, notna | ||
| | @@ -578,15 +579,27 @@ def wrapper(*args, **kwargs): | |
| # a little trickery for aggregation functions that need an axis | ||
| # argument | ||
| kwargs_with_axis = kwargs.copy() | ||
| kwargs_wo_axis = kwargs.copy() | ||
| | ||
| if 'axis' not in kwargs_with_axis or \ | ||
| kwargs_with_axis['axis'] is None: | ||
| kwargs_with_axis['axis'] = self.axis | ||
| | ||
| if name == 'hist' and kwargs_wo_axis.pop('equal_bins', False): | ||
| # GH-22222 | ||
| # if bins==None, use default value used in `hist_series` | ||
| bins = kwargs_wo_axis.pop('bins', 10) | ||
| if is_integer(bins): | ||
| ||
| # share the same numpy array for all group bins | ||
| bins = np.linspace(self.obj.min(), | ||
| self.obj.max(), bins + 1) | ||
| kwargs_wo_axis['bins'] = bins | ||
| | ||
| def curried_with_axis(x): | ||
| return f(x, *args, **kwargs_with_axis) | ||
| | ||
| def curried(x): | ||
| return f(x, *args, **kwargs) | ||
| return f(x, *args, **kwargs_wo_axis) | ||
| ||
| | ||
| # preserve the name so we can detect it when calling plot methods, | ||
| # to avoid duplicates | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | |
| bin edges are calculated and returned. If bins is a sequence, gives | ||
| bin edges, including left edge of first bin and right edge of last | ||
| bin. In this case, bins is returned unmodified. | ||
| bins: integer, default 10 | ||
| Number of histogram bins to be used | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is repeated and seems redundant. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Repeated where? This should still stay, no? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah OK didn't see that. This is fine then | ||
| equal_bins : boolean, default False | ||
| Uses the overall maximum and minimum of the groups to set a shared | ||
| bins sequence, leading to equal bin widths for all | ||
| groups (only works if bins==None or int). | ||
| | ||
| `**kwds` : keywords | ||
| To be passed to the actual plotting function | ||
| | ||
| | @@ -2480,6 +2483,7 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | |
| matplotlib.axes.Axes.hist : Plot a histogram using matplotlib. | ||
| | ||
| """ | ||
| # TODO: separate docstrings of series and groupby hist functions (GH-22241) | ||
| ||
| import matplotlib.pyplot as plt | ||
| | ||
| if by is None: | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -12,6 +12,23 @@ | |
| from pandas.tests.plotting.common import TestPlotBase | ||
| | ||
| | ||
| @td.skip_if_no_mpl | ||
| def test_hist_bins_match(): | ||
| ||
| # GH-22222 | ||
| N = 100 | ||
| bins = 5 | ||
| | ||
| np.random.seed(0) | ||
| df = DataFrame(np.append(np.random.randn(N), np.random.randn(N) / 10), | ||
| columns=['rand']) | ||
| df['group'] = [0] * N + [1] * N | ||
| g = df.groupby('group')['rand'] | ||
| ax = g.hist(bins=bins, alpha=0.7, equal_bins=True)[0] | ||
| bin_width_group0 = ax.patches[0].get_width() | ||
| bin_width_group1 = ax.patches[bins].get_width() | ||
| ||
| assert np.isclose(bin_width_group0, bin_width_group1) | ||
| | ||
| | ||
| @td.skip_if_no_mpl | ||
| class TestDataFrameGroupByPlots(TestPlotBase): | ||
| | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think another copy of kwargs is the cleanest solution here - can you not just
getfrom the kwargs dict instead of popping?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case
matplotltib.pyplot.histthrows an error thatequal_binsis not recognized. An alternative solution that I can think of is to passequal_binsas a named argument tohist_seriesin pandas/plotting/_core.py. But then it will be a dummy variable that's not used within the function. Any suggestions?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify a bit more ~~~, in python3 (but for some reason not python2),~~~ any extra argument passed to
kwargswill be read in this line (askwds):pandas/pandas/plotting/_core.py
Line 2501 in 0370740