Skip to content
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,7 @@ Plotting

- Bug in :func:`DataFrame.plot.scatter` and :func:`DataFrame.plot.hexbin` caused x-axis label and ticklabels to disappear when colorbar was on in IPython inline backend (:issue:`10611`, :issue:`10678`, and :issue:`20455`)
- Bug in plotting a Series with datetimes using :func:`matplotlib.axes.Axes.scatter` (:issue:`22039`)
- The new argument ``equal_bins`` in :func:`DataFrameGroupBy.hist` sets histogram bins to equal values (:issue:`22222`)

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
15 changes: 14 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class providing the base-class of operations.
from pandas.core.dtypes.common import (
is_numeric_dtype,
is_scalar,
is_integer,
ensure_float)
from pandas.core.dtypes.cast import maybe_downcast_to_dtype
from pandas.core.dtypes.missing import isna, notna
Expand Down Expand Up @@ -578,15 +579,27 @@ def wrapper(*args, **kwargs):
# a little trickery for aggregation functions that need an axis
# argument
kwargs_with_axis = kwargs.copy()
kwargs_wo_axis = kwargs.copy()

if 'axis' not in kwargs_with_axis or \
kwargs_with_axis['axis'] is None:
kwargs_with_axis['axis'] = self.axis

if name == 'hist' and kwargs_wo_axis.pop('equal_bins', False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think another copy of kwargs is the cleanest solution here - can you not just get from the kwargs dict instead of popping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case matplotltib.pyplot.hist throws an error that equal_bins is not recognized. An alternative solution that I can think of is to pass equal_bins as a named argument to hist_series in pandas/plotting/_core.py. But then it will be a dummy variable that's not used within the function. Any suggestions?

Copy link
Contributor Author

@javadnoorb javadnoorb Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify a bit more ~~~, in python3 (but for some reason not python2),~~~ any extra argument passed to kwargswill be read in this line (as kwds):

ax.hist(values, bins=bins, **kwds)

# GH-22222
# if bins==None, use default value used in `hist_series`
bins = kwargs_wo_axis.pop('bins', 10)
if is_integer(bins):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use is_scalar here since we only need to guard against scalar vs sequence

# share the same numpy array for all group bins
bins = np.linspace(self.obj.min(),
self.obj.max(), bins + 1)
kwargs_wo_axis['bins'] = bins

def curried_with_axis(x):
return f(x, *args, **kwargs_with_axis)

def curried(x):
return f(x, *args, **kwargs)
return f(x, *args, **kwargs_wo_axis)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 611 will be run b/c 'hist' is in base.plotting_methods, the result is returned, so this change should only have a local effect.


# preserve the name so we can detect it when calling plot methods,
# to avoid duplicates
Expand Down
8 changes: 6 additions & 2 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
bins: integer, default 10
Number of histogram bins to be used
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated and seems redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated where? This should still stay, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bins is defined on line 2468 and 2473. It looks like a mistake when writing the docstring. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK didn't see that. This is fine then

equal_bins : boolean, default False
Uses the overall maximum and minimum of the groups to set a shared
bins sequence, leading to equal bin widths for all
groups (only works if bins==None or int).

`**kwds` : keywords
To be passed to the actual plotting function

Expand All @@ -2480,6 +2483,7 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
matplotlib.axes.Axes.hist : Plot a histogram using matplotlib.

"""
# TODO: separate docstrings of series and groupby hist functions (GH-22241)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this comment here - the open issue should suffice

import matplotlib.pyplot as plt

if by is None:
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/plotting/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,23 @@
from pandas.tests.plotting.common import TestPlotBase


@td.skip_if_no_mpl
def test_hist_bins_match():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this function into the class that already exists in the module

# GH-22222
N = 100
bins = 5

np.random.seed(0)
df = DataFrame(np.append(np.random.randn(N), np.random.randn(N) / 10),
columns=['rand'])
df['group'] = [0] * N + [1] * N
g = df.groupby('group')['rand']
ax = g.hist(bins=bins, alpha=0.7, equal_bins=True)[0]
bin_width_group0 = ax.patches[0].get_width()
bin_width_group1 = ax.patches[bins].get_width()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using np.random can you not define a set of data whereby you can also easily assert that both groups have an equal number of bins across the same X-scale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can replace np.random with some predefined values. But I think that comparing the number of bins wouldn't do the job. That's the current state of the code. We're here just changing the range without changing the number of bins.

I've written a new test, that compares the x-axis values of the bins, that should be more rigorous. For some reason the functions see kwargs differently in python 2 and 3, so some CI tests fail in the former. I'll make a commit once I figure out what's going on.

assert np.isclose(bin_width_group0, bin_width_group1)


@td.skip_if_no_mpl
class TestDataFrameGroupByPlots(TestPlotBase):

Expand Down