-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: Allow for join between two multi-index dataframe instances #20356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fa4dbac to d694f25 Compare | 'LinkType', 'Distance']) | ||
| .set_index(['Origin', 'Destination', 'Period', 'LinkType'])) | ||
| | ||
| def f(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there other error conditions to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails and I'm not sure how we should handle join on empty levels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show a mini-example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Here is an example of a join on 2 multilevel indexed (same levels) dfs using two different methods
- pd.merge(df1.reset_index(), df2.reset_index(),...)
- df1.join(df2)
The results differ. Do you think that's an issue? I'm facing a similar issue when I try a multi-level join.
import numpy as np import pandas as pd join_type='left' left_multi=( pd.DataFrame( dict(Origin=['A', 'A', 'B', 'B', 'C'], Destination=[np.nan] * 5, Trips=[1987, 3647, 2470, 4296, 4444]), columns=['Origin', 'Destination', 'Trips']) .set_index(['Origin', 'Destination'])) right_multi=( pd.DataFrame( dict(Origin=['A', 'A', 'B', 'B', 'C', 'C', 'E'], Destination=[np.nan] * 7, Distance=[100, 80, 90, 80, 75, 35, 55]), columns=['Origin', 'Destination', 'Distance']) .set_index(['Origin', 'Destination'])) on_cols = ['Origin', 'Destination'] idx_cols = ['Origin', 'Destination'] expected = (pd.merge(left_multi.reset_index(), right_multi.reset_index(), how=join_type, on=on_cols).set_index(idx_cols) .sort_index()) result = left_multi.join(right_multi, how=join_type).sort_index() print(expected) print(result) pandas/core/reshape/merge.py Outdated
| | ||
| # Inject -1 in the labels list where a join was not possible | ||
| # IOW indexer[i]=-1 | ||
| labels = [restore_labels[i] if i != -1 else -1 for i in indexer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a set operation on the arrays i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @jreback but I'm not sure what you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was addressed here (Thanks to @TomAugspurger):
Codecov Report
@@ Coverage Diff @@ ## master #20356 +/- ## ========================================== + Coverage 92.24% 92.25% +<.01% ========================================== Files 161 161 Lines 51339 51376 +37 ========================================== + Hits 47360 47397 +37 Misses 3979 3979
Continue to review full report at Codecov.
|
310bf7a to a6c9733 Compare | Hello @harisbal! Thanks for updating the PR.
Comment last updated on November 11, 2018 at 04:31 Hours UTC |
f668710 to 8e5fcf1 Compare | Any progress on this? |
| sorry me take a look. i know this has been outstanding for quite some time. |
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rebase and let's get this in
| I'll take a look asap. Cheers |
de6c469 to 50c90cc Compare 6bd10f4 to 5689f0a Compare 5689f0a to 2d61a12 Compare | Any idea why pandas-dev.pandas failed? |
db133f0 to 01ae19e Compare 01ae19e to 4d4acc5 Compare
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, @jorisvandenbossche @TomAugspurger if you'd have a look
47bb4fe to 8b44f42 Compare
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update some older questions
pandas/core/reshape/merge.py Outdated
| Parameters | ||
| ---------- | ||
| left : Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this
pandas/core/reshape/merge.py Outdated
| list of non-common levels | ||
| join_idx : Index | ||
| the index of the join between the common levels of left and right | ||
| lidx : intp array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update
pandas/core/reshape/merge.py Outdated
| Returns | ||
| ------- | ||
| levels : intp ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this correct?
6d6678b to 0e9c060 Compare 0e9c060 to f54c151 Compare …ex-join # Conflicts: # doc/source/whatsnew/v0.24.0.txt # pandas/core/reshape/merge.py # pandas/tests/reshape/merge/test_multi.py
be4aec7 to ecaf515 Compare | How's this looking? I haven't checked on the changes in a while, but CI is passing. |
| @TomAugspurger I had some more comments. let me have a look again. |
| @harisbal can you merge master @TomAugspurger this lgtm. let's merge and can followup on any small issues. |
| Merged master. Ping on green. |
| Shall I try to merge again? |
| I restarted that crashed worker. I haven't seen that failure before. |
| All green. Merging. Thanks! |
| @jreback @TomAugspurger @WillAyd Thank you so much for everything!! |
closes #16162
closes #6360
Allow to join on multiple levels for multi-indexed dataframe instances