Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
Initial commit
GH21220
  • Loading branch information
KalyanGokhale committed May 27, 2018
commit 91a3f1663437293eb1ede6e8784bfd7c95fb9b9a
20 changes: 13 additions & 7 deletions pandas/core/reshape/merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import pandas.compat as compat

from pandas import (Categorical, DataFrame,
Index, MultiIndex, Timedelta)
Index, MultiIndex, Timedelta, Series)
from pandas.core.arrays.categorical import _recode_for_categories
from pandas.core.frame import _merge_doc
from pandas.core.dtypes.common import (
Expand Down Expand Up @@ -492,6 +492,10 @@ def __init__(self, left, right, how='inner', on=None,
left_index=False, right_index=False, sort=True,
suffixes=('_x', '_y'), copy=True, indicator=False,
validate=None):
if isinstance(left, Series):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u are actually going to do this
then you need to strip a ton of logic from below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify this comment? What logic would need to be stripped?

Currently, this just raises when you pass a Series (see line 525 a bit below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks both - will make edits to this based on @jreback 's clarification

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm on the code removal, we do extension promotion in concat, but nothing here.

left = left.to_frame()
if isinstance(right, Series):
right = right.to_frame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to validate that the input is a Series or Dataframe and nothing else

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - Had initially addeded these validations at 525 and 528 and also before 541 (if memory serves right) on my local machine - but then it fails at 571...

Thus after trying to fix the same issue (checking if input is Series or DataFrame) at multiple places, the current change seemed like the most logical fix as we are changing the type of incoming Series object to DataFrame and then avoiding any susequent changes to the code....

However open to suggestions...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is ok, but what I mean is validation that you have a Series/DataFrame and nothing else. (and a test to go with that), something like

left = validate_operand(left) right = validate_operand(right) ...... def validate_operand(obj): if isinstance(obj, DataFrame): return obj elif isinstance(obj, Series): return obj.to_frame() raise ValueError(.....) 
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to test this with left_index=True, right_index=True (which should be fine). And a test that a Series w/o a name raises (it will convert just fine, but then you would have to specify the on in a funny way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review - done.
also raises a ValueError for Series w/o a name

self.left = self.orig_left = left
self.right = self.orig_right = right
self.how = how
Expand Down Expand Up @@ -535,12 +539,14 @@ def __init__(self, left, right, how='inner', on=None,
'{right_index}'.format(right_index=type(right_index)))

# warn user when merging between different levels
if left.columns.nlevels != right.columns.nlevels:
msg = ('merging between different levels can give an unintended '
'result ({left} levels on the left, {right} on the right)'
).format(left=left.columns.nlevels,
right=right.columns.nlevels)
warnings.warn(msg, UserWarning)
if isinstance(left, DataFrame) and isinstance(right, DataFrame):
if left.columns.nlevels != right.columns.nlevels:
msg = ('merging between different levels can give an '
'unintended result ({left} levels on the left, '
'{right} on the right)'
).format(left=left.columns.nlevels,
right=right.columns.nlevels)
warnings.warn(msg, UserWarning)

self._validate_specification()

Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/reshape/merge/test_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -1864,3 +1864,24 @@ def test_merge_index_types(index):
OrderedDict([('left_data', [1, 2]), ('right_data', [1.0, 2.0])]),
index=index)
assert_frame_equal(result, expected)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also parameterize this as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

def test_merge_series():
# GH 21220
a = pd.DataFrame({"A": [1, 2, 3, 4]},
index=pd.MultiIndex.from_product([['a', 'b'], [0, 1]],
names=['outer', 'inner']))
b = pd.Series([1, 2, 3, 4],
index=pd.MultiIndex.from_product([['a', 'b'], [1, 2]],
names=['outer', 'inner']), name='B')
expected = pd.DataFrame({"A": [2, 4], "B": [1, 3]},
index=pd.MultiIndex.from_product([['a', 'b'], [1]],
names=['outer', 'inner']))

# Testing current merge behvaior is as before
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments are not meaningful pls change them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = pd.merge(a, b.to_frame(), on=['outer', 'inner'])
tm.assert_frame_equal(result, expected)

# Testing changed merge behvaior is as expected
result = pd.merge(a, b, on=['outer', 'inner'])
tm.assert_frame_equal(result, expected)