Skip to content
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,7 @@ Plotting

- Bug in :func:`DataFrame.plot.scatter` and :func:`DataFrame.plot.hexbin` caused x-axis label and ticklabels to disappear when colorbar was on in IPython inline backend (:issue:`10611`, :issue:`10678`, and :issue:`20455`)
- Bug in plotting a Series with datetimes using :func:`matplotlib.axes.Axes.scatter` (:issue:`22039`)
- The new argument ``equal_bins`` in :func:`SeriesGroupBy.hist` sets histogram bins to equal values (:issue:`22222`)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually hist method of a SeriesGroupBy object, although that't not available in the docs:

https://pandas.pydata.org/pandas-docs/stable/api.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is available in api.rst if that suffices:

The following methods are available in both ``SeriesGroupBy`` and ``DataFrameGroupBy`` objects, but may differ slightly, usually in that the ``DataFrameGroupBy`` version usually permits the specification of an axis argument, and often an argument indicating whether to restrict application to columns of a specific data type. .. autosummary:: :toctree: generated/ ... DataFrameGroupBy.hist 
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just change the reference to DataFrameGroupBy.hist in the whatsnew - should suffice


Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
10 changes: 10 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,16 @@ def wrapper(*args, **kwargs):
if 'axis' not in kwargs_with_axis or \
kwargs_with_axis['axis'] is None:
kwargs_with_axis['axis'] = self.axis
if name == 'hist' and kwargs.get('equal_bins', False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just do kwargs.get('equal_bins') - no need for the False argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason py3 and py2 kwargs behavior are different. In py3 kwargs defaults are not set here if the parameter is not provided by the user. I opened an issue on this (#22285). I'm not sure where to move the code (per @jreback's comment) to avoid this.

# GH-22222
bins = kwargs.get('bins')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wondering if we even need to use .get here - can we just access by index? I'm assuming that 'equal_bins' should always be there so a KeyError when it's not would be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same py2/3 issue. In py3 equal_bins, bins, etc do not take their default values unless the code inside hist_series function is being run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not add anything here

this is obscuring already extremely fragile code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not in the wrapper? Or also beyond that? Is there any other place you'd suggest?

if bins is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to where bins doesn't default to 10?

bins = 10 # use default value used in `hist_series`
if is_scalar(bins):
# share the same numpy array for all group bins
bins = np.linspace(self.obj.min(),
self.obj.max(), bins + 1)
kwargs['bins'] = bins

def curried_with_axis(x):
return f(x, *args, **kwargs_with_axis)
Expand Down
9 changes: 6 additions & 3 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2443,7 +2443,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,

def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, figsize=None,
bins=10, **kwds):
bins=10, equal_bins=False, **kwds):
Copy link
Contributor Author

@javadnoorb javadnoorb Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying the alternative approach here. equal_bins will be unused within the function. This is just to avoid popping kwds.

This also takes care of py2/3 compat issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have missed the compat issue but if you are using .get it shouldn't matter whether this exists as a keyword or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwds can only contain matploltib.pyplot.hist input args (b/c of line 2504) and throws an error if equal_bins is a keyword.

"""
Draw histogram of the input series using matplotlib

Expand All @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
bins: integer, default 10
Number of histogram bins to be used
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated and seems redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated where? This should still stay, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bins is defined on line 2468 and 2473. It looks like a mistake when writing the docstring. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK didn't see that. This is fine then

equal_bins : boolean, default False
Uses the overall maximum and minimum of the groups to set a shared
bins sequence, leading to equal bin widths for all
groups (only works if bins==None or int).

`**kwds` : keywords
To be passed to the actual plotting function

Expand Down
41 changes: 41 additions & 0 deletions pandas/tests/plotting/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,52 @@
import numpy as np

from pandas.tests.plotting.common import TestPlotBase
import pytest


@td.skip_if_no_mpl
class TestDataFrameGroupByPlots(TestPlotBase):

@pytest.mark.parametrize('equal_bins, bins, expected1, expected2', (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need two expected variables for the two groups. They correspond to the leftmost part of each histogram bar on the x-axis.

(True, 10,
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]),
(True, None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the default is always 10 for bins then None is not a valid argument here and should in fact raise an error instead of giving valid results

[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]),
(True, [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4, 3.],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]),
(False, [-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4, 3.],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4]),
(False, 10,
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-1., -0.8, -0.6, -0.4, -0.2, 0., 0.2, 0.4, 0.6, 0.8]),
(False, None,
[-3., -2.4, -1.8, -1.2, -0.6, 0., 0.6, 1.2, 1.8, 2.4],
[-1., -0.8, -0.6, -0.4, -0.2, 0., 0.2, 0.4, 0.6, 0.8])
))
def test_hist_bins_match(self, equal_bins, bins, expected1, expected2):
# GH-22222
df = DataFrame(np.append(np.linspace(-3, 3, 100),
np.linspace(-1, 1, 100)),
columns=['value'])
df['group'] = [0] * 100 + [1] * 100
g = df.groupby('group')['value']
ax = g.hist(bins=bins, alpha=0.7, equal_bins=equal_bins)[0]

# x-axis (leftmost) hist bar coordinates:
# group0: points[:num_bins]
# group1: points[num_bins:]
num_bins = len(expected1)
points = np.array([patch.get_bbox().get_points()
for patch in ax.patches])[:, 0, 0]

tm.assert_almost_equal(points[:num_bins], np.array(expected1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer to use tm.assert_numpy_array_equal instead here and on line below

tm.assert_almost_equal(points[num_bins:], np.array(expected2))
tm.close()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to close the figure (to avoid overwriting it during parametrize)


def test_series_groupby_plotting_nominally_works(self):
n = 10
weight = Series(np.random.normal(166, 20, size=n))
Expand Down