-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: option for groupby.hist to match bins #22228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
8a9ddfb 2d003b4 277035a 8e3feb7 006aea7 f488e88 5a14dec 650405d a5a287f 82c1cc9 18c2564 d52ec4b 4a42feb adccd1a File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -28,6 +28,7 @@ class providing the base-class of operations. | |
| from pandas.core.dtypes.common import ( | ||
| is_numeric_dtype, | ||
| is_scalar, | ||
| is_integer, | ||
| ensure_float) | ||
| from pandas.core.dtypes.cast import maybe_downcast_to_dtype | ||
| from pandas.core.dtypes.missing import isna, notna | ||
| | @@ -581,6 +582,17 @@ def wrapper(*args, **kwargs): | |
| if 'axis' not in kwargs_with_axis or \ | ||
| kwargs_with_axis['axis'] is None: | ||
| kwargs_with_axis['axis'] = self.axis | ||
| if (name == 'hist' and | ||
| kwargs.get('equal_bins', False) is True): | ||
| # GH-22222 | ||
| bins = kwargs.get('bins') | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm wondering if we even need to use .get here - can we just access by index? I'm assuming that 'equal_bins' should always be there so a KeyError when it's not would be fine Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same py2/3 issue. In py3 Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do not add anything here this is obscuring already extremely fragile code Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you mean not in the wrapper? Or also beyond that? Is there any other place you'd suggest? | ||
| if bins is None: | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the purpose of this? Doesn't the keyword already default to 10? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The keyword defaults to 10 after the function is called ( Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point me to where | ||
| bins = 10 # use default value used in `hist_series` | ||
| if is_integer(bins): | ||
| ||
| # share the same numpy array for all group bins | ||
| bins = np.linspace(self.obj.min(), | ||
| self.obj.max(), bins + 1) | ||
| kwargs['bins'] = bins | ||
| | ||
| def curried_with_axis(x): | ||
| return f(x, *args, **kwargs_with_axis) | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -2443,7 +2443,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None, | |
| | ||
| def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | ||
| xrot=None, ylabelsize=None, yrot=None, figsize=None, | ||
| bins=10, **kwds): | ||
| bins=10, equal_bins=False, **kwds): | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying the alternative approach here. This also takes care of py2/3 compat issue. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might have missed the compat issue but if you are using Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
| ||
| """ | ||
| Draw histogram of the input series using matplotlib | ||
| | ||
| | @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None, | |
| bin edges are calculated and returned. If bins is a sequence, gives | ||
| bin edges, including left edge of first bin and right edge of last | ||
| bin. In this case, bins is returned unmodified. | ||
| bins: integer, default 10 | ||
| Number of histogram bins to be used | ||
| Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is repeated and seems redundant. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Repeated where? This should still stay, no? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah OK didn't see that. This is fine then | ||
| equal_bins : boolean, default False | ||
| Uses the overall maximum and minimum of the groups to set a shared | ||
| bins sequence, leading to equal bin widths for all | ||
| groups (only works if bins==None or int). | ||
| | ||
| `**kwds` : keywords | ||
| To be passed to the actual plotting function | ||
| | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -10,10 +10,54 @@ | |
| import numpy as np | ||
| | ||
| from pandas.tests.plotting.common import TestPlotBase | ||
| import pytest | ||
| | ||
| | ||
| @td.skip_if_no_mpl | ||
| class TestDataFrameGroupByPlots(TestPlotBase): | ||
| @pytest.mark.parametrize( | ||
| ||
| 'bins, equal_bins', | ||
| zip([5, None, np.linspace(-3, 3, 10)], [True, False]) | ||
| ) | ||
| ||
| def test_hist_bins_match(self, bins, equal_bins): | ||
| # GH-22222 | ||
| N = 100 | ||
| df = DataFrame(np.append(np.linspace(-3, 3, N), | ||
| np.linspace(-1, 1, N)), | ||
| columns=['rand']) | ||
| df['group'] = [0] * N + [1] * N | ||
| g = df.groupby('group')['rand'] | ||
| axes = g.hist(bins=bins, alpha=0.7, equal_bins=equal_bins) | ||
| ax = axes[0] | ||
| | ||
| if bins is None: | ||
| ||
| num_bins = 10 # default value used in `hist_series` | ||
| elif type(bins) == np.ndarray: | ||
| num_bins = len(bins) - 1 | ||
| else: | ||
| num_bins = bins | ||
| | ||
| # individual hist bar coordinates for: | ||
| # group0: points[:num_bins] | ||
| # group1: points[num_bins:] | ||
| points = np.array([patch.get_bbox().get_points() | ||
| for patch in ax.patches]) | ||
| | ||
| if equal_bins or type(bins) == np.ndarray: | ||
| # compare leftmost point on x-axis for each bar | ||
| # in the two groups | ||
| assert np.isclose(points[:num_bins, 0, 0], | ||
| points[num_bins:, 0, 0] | ||
| ).all() | ||
| else: | ||
| # range of values on x-axis (min, max) for each group | ||
| hist_ranges = DataFrame( | ||
| [[points[0, 0, 0], points[num_bins - 1, 1, 0]], | ||
| [points[num_bins, 0, 0], points[-1, 1, 0]]], | ||
| index=[0, 1], columns=['min', 'max']) | ||
| group_ranges = g.agg([min, max]) | ||
| assert np.isclose(group_ranges, hist_ranges).all() | ||
| tm.close() | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what the purpose of this call is? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is to close the figure (to avoid overwriting it during | ||
| | ||
| def test_series_groupby_plotting_nominally_works(self): | ||
| n = 10 | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need
is Truehere or can we use the implicit truthiness of objects?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a bit safer this way, as it guards against some non-boolean statements by the user, such as
equal_bins=None. But your suggestion should be fine if we ignore bad inputs from user.