-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
TST: Test sorting levels not aligned with index (#25775) #26492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jschendel left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! A couple comments on simplifying the implementation.
pandas/tests/frame/test_sorting.py Outdated
| @pytest.mark.parametrize('seed', list(range(100))) | ||
| def test_sort_multi_index(self, seed): | ||
| # GH 25775 | ||
| pd.np.random.seed(seed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a single deterministic example is probably easier to read and maintain.
I've found that the following example reproduces the issue on 0.23.0:
In [1]: import pandas as pd; pd.__version__ Out[1]: '0.23.0' In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 0.4 0 0 a 0.1 1 1 b 0.3 1 2 c 0.2 1 3 dAnd works on master:
In [1]: import pandas as pd; pd.__version__ Out[1]: '0.25.0.dev0+596.g20d0ad159a' In [2]: df = pd.DataFrame({'a': [0.4, 0.1, 0.3, 0.2], 'b': [0, 1, 1, 1], 'c': range(4), 'd': list('abcd')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 0.4 0 0 a 0.1 1 1 b 0.2 1 3 d 0.3 1 2 cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! I updated to a single example. I'm using the one given in #25775.
pandas/tests/frame/test_sorting.py Outdated
| def test_sort_multi_index(self, seed): | ||
| # GH 25775 | ||
| pd.np.random.seed(seed) | ||
| df1 = (pd.DataFrame(pd.np.random.rand(10, 5), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame is already imported, so you don't need the pd. prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, updated
Codecov Report
@@ Coverage Diff @@ ## master #26492 +/- ## ========================================== - Coverage 91.88% 91.87% -0.01% ========================================== Files 174 174 Lines 50692 50692 ========================================== - Hits 46576 46571 -5 - Misses 4116 4121 +5
Continue to review full report at Codecov.
|
7efc4b8 to 58d8782 Compare a89216a to 2a3bf94 Compare pandas/tests/frame/test_sorting.py Outdated
| assert_frame_equal(sorted_df, expected) | ||
| | ||
| def test_sort_multi_index(self): | ||
| # GH 25775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am not clear by this example what you are even trying to test here. can you add some comments on the test and make the example a little more fluid (maybe add some comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the simple example by the other reviewer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what this is testing, example is nice
pandas/tests/frame/test_sorting.py Outdated
| '5,0.639,1,0.944,0.521,0.414' + \ | ||
| '6,0.264,7,0.456,0.568,0.018' + \ | ||
| '7,0.617,6,0.616,0.943,0.681' + \ | ||
| '8,0.359,4,0.697,0.060,0.666' + \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason we are using a fixed string here rather than a generated frame with a seed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other reviewer requested "a single deterministic example" in contrast to iterating through a range of seeds. I wasn't sure what they exactly meant by "deterministic" i.e. whether fixing a seed is good enough or should the dataframe be explicitly given
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I meant a single explicit DataFrame that minimally reproduces the error. Setting a random seed and the generating random floats had nothing to do with the actual error in question, so IMO it made it less obvious as to what's going on. It also resulted in there be a lot more data than necessary to reproduce the error, which makes it harder to follow what's being tested.
So basically something like:
df = DataFrame({'a': [3, 1, 2], 'b': [0, 0, 0], 'c': [0, 1, 2], 'd': list('abc')}) result = df.set_index(list('abc')).sort_index(level=list('ba')) expected = DataFrame({'a': [1, 2, 3], 'b': [0, 0, 0], 'c': [1, 2, 0], 'd': list('bca')}) expected = expected.set_index(list('abc')) tm.assert_frame_equal(result, expected)Note that df as defined above does indeed not sort correctly on 0.23.4:
In [1]: import numpy as np; import pandas as pd; pd.__version__ Out[1]: '0.23.4' In [2]: df = pd.DataFrame({'a' : [3, 1, 2], 'b': [0, 0, 0,], 'c': [0, 1, 2], 'd': list('abc')}) In [3]: df.set_index(list('abc')).sort_index(level=list('ba')) Out[3]: d a b c 3 0 0 a 1 0 1 b 2 0 2 cI suppose you could parametrize over level if you wanted to, as adding 'c' still results in invalid sorting:
In [4]: df.set_index(list('abc')).sort_index(level=list('bac')) Out[4]: d a b c 3 0 0 a 1 0 1 b 2 0 2 cThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jschendel, switched to your simpler example.
pandas/tests/frame/test_sorting.py Outdated
| '9,0.670,2,0.128,0.315,0.363' | ||
| data = StringIO(s) | ||
| df1 = (pd.read_csv(data) | ||
| .assign(b=lambda df: (df.b * 10).astype(int))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these df1, df2 are not clear at all can you rename to something more descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to the simple example by the other reviewer
2a3bf94 to 1763b74 Compare
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls merge master and ping on green.
pandas/tests/frame/test_sorting.py Outdated
| assert_frame_equal(sorted_df, expected) | ||
| | ||
| def test_sort_multi_index(self): | ||
| # GH 25775 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here on what this is testing, example is nice
1763b74 to 5eb692b Compare | all green @jreback |
| thanks @mahepe |
sort_indexdoes not work with levels not aligned with index #25775git diff upstream/master -u -- "*.py" | flake8 --diffWith
masterat 6d2398a, issue #25775 seems resolved. Added validation tests.