Skip to content
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,7 @@ Plotting

- Bug in :func:`DataFrame.plot.scatter` and :func:`DataFrame.plot.hexbin` caused x-axis label and ticklabels to disappear when colorbar was on in IPython inline backend (:issue:`10611`, :issue:`10678`, and :issue:`20455`)
- Bug in plotting a Series with datetimes using :func:`matplotlib.axes.Axes.scatter` (:issue:`22039`)
- The new argument ``equal_bins`` in :func:`DataFrameGroupBy.hist` sets histogram bins to equal values (:issue:`22222`)

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
12 changes: 12 additions & 0 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class providing the base-class of operations.
from pandas.core.dtypes.common import (
is_numeric_dtype,
is_scalar,
is_integer,
ensure_float)
from pandas.core.dtypes.cast import maybe_downcast_to_dtype
from pandas.core.dtypes.missing import isna, notna
Expand Down Expand Up @@ -581,6 +582,17 @@ def wrapper(*args, **kwargs):
if 'axis' not in kwargs_with_axis or \
kwargs_with_axis['axis'] is None:
kwargs_with_axis['axis'] = self.axis
if (name == 'hist' and
kwargs.get('equal_bins', False) is True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need is True here or can we use the implicit truthiness of objects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a bit safer this way, as it guards against some non-boolean statements by the user, such as equal_bins=None. But your suggestion should be fine if we ignore bad inputs from user.

# GH-22222
bins = kwargs.get('bins')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm wondering if we even need to use .get here - can we just access by index? I'm assuming that 'equal_bins' should always be there so a KeyError when it's not would be fine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same py2/3 issue. In py3 equal_bins, bins, etc do not take their default values unless the code inside hist_series function is being run.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not add anything here

this is obscuring already extremely fragile code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean not in the wrapper? Or also beyond that? Is there any other place you'd suggest?

if bins is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this? Doesn't the keyword already default to 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword defaults to 10 after the function is called (curried_axis), but we need its value before that (on line 593)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to where bins doesn't default to 10?

bins = 10 # use default value used in `hist_series`
if is_integer(bins):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just use is_scalar here since we only need to guard against scalar vs sequence

# share the same numpy array for all group bins
bins = np.linspace(self.obj.min(),
self.obj.max(), bins + 1)
kwargs['bins'] = bins

def curried_with_axis(x):
return f(x, *args, **kwargs_with_axis)
Expand Down
9 changes: 6 additions & 3 deletions pandas/plotting/_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -2443,7 +2443,7 @@ def hist_frame(data, column=None, by=None, grid=True, xlabelsize=None,

def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
xrot=None, ylabelsize=None, yrot=None, figsize=None,
bins=10, **kwds):
bins=10, equal_bins=False, **kwds):
Copy link
Contributor Author

@javadnoorb javadnoorb Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying the alternative approach here. equal_bins will be unused within the function. This is just to avoid popping kwds.

This also takes care of py2/3 compat issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have missed the compat issue but if you are using .get it shouldn't matter whether this exists as a keyword or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwds can only contain matploltib.pyplot.hist input args (b/c of line 2504) and throws an error if equal_bins is a keyword.

"""
Draw histogram of the input series using matplotlib

Expand All @@ -2470,8 +2470,11 @@ def hist_series(self, by=None, ax=None, grid=True, xlabelsize=None,
bin edges are calculated and returned. If bins is a sequence, gives
bin edges, including left edge of first bin and right edge of last
bin. In this case, bins is returned unmodified.
bins: integer, default 10
Number of histogram bins to be used
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated and seems redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated where? This should still stay, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bins is defined on line 2468 and 2473. It looks like a mistake when writing the docstring. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK didn't see that. This is fine then

equal_bins : boolean, default False
Uses the overall maximum and minimum of the groups to set a shared
bins sequence, leading to equal bin widths for all
groups (only works if bins==None or int).

`**kwds` : keywords
To be passed to the actual plotting function

Expand Down
44 changes: 44 additions & 0 deletions pandas/tests/plotting/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,54 @@
import numpy as np

from pandas.tests.plotting.common import TestPlotBase
import pytest


@td.skip_if_no_mpl
class TestDataFrameGroupByPlots(TestPlotBase):
@pytest.mark.parametrize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability nit but let's add an empty line here

'bins, equal_bins',
zip([5, None, np.linspace(-3, 3, 10)], [True, False])
)
Copy link
Contributor Author

@javadnoorb javadnoorb Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying all "possible" forms of bins with and without equal_bins

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still prefer if you could be explicit about what is expected here, i.e. add expected as a third parameter here and assert that we get the correct number of bins across the two groups

def test_hist_bins_match(self, bins, equal_bins):
# GH-22222
N = 100
df = DataFrame(np.append(np.linspace(-3, 3, N),
np.linspace(-1, 1, N)),
columns=['rand'])
df['group'] = [0] * N + [1] * N
g = df.groupby('group')['rand']
axes = g.hist(bins=bins, alpha=0.7, equal_bins=equal_bins)
ax = axes[0]

if bins is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just generally avoid this type of code in tests - you aren't asserting anything about the functionality of the code being tested but rather just adding separate logic here in the test case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should be slightly different, depending on the type provided for bins. Do you recommend dropping parametrize decorator and writing separate tests for these instead?

num_bins = 10 # default value used in `hist_series`
elif type(bins) == np.ndarray:
num_bins = len(bins) - 1
else:
num_bins = bins

# individual hist bar coordinates for:
# group0: points[:num_bins]
# group1: points[num_bins:]
points = np.array([patch.get_bbox().get_points()
for patch in ax.patches])

if equal_bins or type(bins) == np.ndarray:
# compare leftmost point on x-axis for each bar
# in the two groups
assert np.isclose(points[:num_bins, 0, 0],
points[num_bins:, 0, 0]
).all()
else:
# range of values on x-axis (min, max) for each group
hist_ranges = DataFrame(
[[points[0, 0, 0], points[num_bins - 1, 1, 0]],
[points[num_bins, 0, 0], points[-1, 1, 0]]],
index=[0, 1], columns=['min', 'max'])
group_ranges = g.agg([min, max])
assert np.isclose(group_ranges, hist_ranges).all()
tm.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the purpose of this call is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm.close() closes any open figure. Otherwise because of pytest.parameterize we will keep plotting on the same figure and accumulate counts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to close the figure (to avoid overwriting it during parametrize)


def test_series_groupby_plotting_nominally_works(self):
n = 10
Expand Down