-
- Notifications
You must be signed in to change notification settings - Fork 19.3k
BUG: for ordered categorical data implements correct computation of kendall/spearman correlations #62880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
pandeconscious wants to merge 31 commits into pandas-dev:main Choose a base branch from pandeconscious:ordered_cat_corr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Open
BUG: for ordered categorical data implements correct computation of kendall/spearman correlations #62880
Changes from 1 commit
Commits
Show all changes
31 commits Select commit Hold shift + click to select a range
1f8c628 init commit kendall spearman ordinal cats
pandeconscious 906f1e4 Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 497dc7e series test update and fixes
pandeconscious 583aca6 cat desc longer in tests
pandeconscious e069810 testing frame corr
pandeconscious b90726f pre commit fixes v2
pandeconscious 65a506c cleanup
pandeconscious ab3b8b9 Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious e93ed83 test import scipy fix
pandeconscious ec4d97e rst sorting autofix
pandeconscious ebfc3b0 Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 8cfacef Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 7ef7fb2 Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 588808a refactor
pandeconscious c484552 fix dtype for duplicates
pandeconscious 216475c Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious e997747 clean up
pandeconscious 4184167 Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 8bcd3dc Merge branch 'pandas-dev:main' into ordered_cat_corr
pandeconscious 2673281 clean up
pandeconscious ff48847 import fix
pandeconscious 1c69e29 test tranform ordered cat func
pandeconscious 8b26a7d tests and mypy fixes
pandeconscious a625520 type check fix
pandeconscious 259424e addressing review comments
pandeconscious f141e6a Merge branch 'main' into ordered_cat_corr
pandeconscious d2d0f71 type fix corr.py
pandeconscious 858d0c2 ruff format
pandeconscious a8c88c7 mypy fix
pandeconscious 1a472e3 Merge branch 'main' into ordered_cat_corr
pandeconscious 71305aa scipy unavailable fix in test
pandeconscious File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
addressing review comments
- Loading branch information
commit 259424e6ba56c60e3912c11ca16763b476f352db
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -187,19 +187,19 @@ def test_corr_callable_method(self, datetime_series): | |
| | ||
| @pytest.mark.parametrize("method", ["kendall", "spearman"]) | ||
| @pytest.mark.parametrize( | ||
| "ord_cat_series", | ||
| "cat_series", | ||
| [ | ||
| Series( # ordered categorical series | ||
| pd.Categorical( | ||
| ["low", "med", "high", "very_high"], | ||
| categories=["low", "med", "high", "very_high"], | ||
| Series( | ||
| pd.Categorical( # ordered cat series | ||
| ["low", "medium", "high"], | ||
| categories=["low", "medium", "high"], | ||
| ordered=True, | ||
| ) | ||
| ), | ||
| Series( # ordered categorical series with nan and a different ranking | ||
| pd.Categorical( | ||
| ["h", "low", "vh", None], | ||
| categories=["low", "m", "h", "vh"], | ||
| Series( | ||
| pd.Categorical( # ordered cat series with NA | ||
| ["low", "medium", "high", None], | ||
| categories=["low", "medium", "high"], | ||
| ordered=True, | ||
| ) | ||
| ), | ||
| | @@ -208,36 +208,23 @@ def test_corr_callable_method(self, datetime_series): | |
| @pytest.mark.parametrize( | ||
| "other_series", | ||
| [ | ||
| Series( # int series against which tord cat series is correlated | ||
| [0, 1, 2, 3] | ||
| ), | ||
| Series( # float series against which ord cat series is correlated | ||
| [2.0, 3.0, 4.5, 6.5] | ||
| ), | ||
| Series( # other ord cat series against which ord cat series is correlated | ||
| Series( # other cat ordered series | ||
| pd.Categorical( | ||
| ["high", "low", "very_high", "med"], | ||
| categories=["low", "med", "high", "very_high"], | ||
| ["m", "l", "h"], | ||
| categories=["l", "m", "h"], | ||
| ordered=True, | ||
| ) | ||
| ), | ||
| # other non cat series | ||
| Series([2, 1, 3]), | ||
| ], | ||
| ) | ||
| def test_corr_rank_ordered_categorical( | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is pretty long, to the point where its unclear what its intent is. Maybe its worth breaking up into a few tests? Or adding parameterization? Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fixed | ||
| self, | ||
| method, | ||
| ord_cat_series, | ||
| cat_series, | ||
| other_series, | ||
| ): | ||
| stats = pytest.importorskip("scipy.stats") | ||
| method_scipy_func = {"kendall": stats.kendalltau, "spearman": stats.spearmanr} | ||
| ord_ser_cat_codes = ord_cat_series.cat.codes.replace(-1, np.nan) | ||
| | ||
| if other_series.dtype == "category" and other_series.cat.ordered: | ||
| other_series = other_series.cat.codes.replace(-1, np.nan) | ||
| | ||
| corr_calc = ord_cat_series.corr(other_series, method=method) | ||
| corr_expected = method_scipy_func[method]( | ||
| ord_ser_cat_codes, other_series, nan_policy="omit" | ||
| )[0] | ||
| tm.assert_almost_equal(corr_calc, corr_expected) | ||
| expected_corr = {"kendall": 0.33333333333333337, "spearman": 0.5} | ||
| corr_calc = cat_series.corr(other_series, method=method) | ||
| tm.assert_almost_equal(corr_calc, expected_corr[method]) | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit. This suggestion is invalid because no changes were made to the code. Suggestions cannot be applied while the pull request is closed. Suggestions cannot be applied while viewing a subset of changes. Only one suggestion per line can be applied in a batch. Add this suggestion to a batch that can be applied as a single commit. Applying suggestions on deleted lines is not supported. You must change the existing code in this line in order to create a valid suggestion. Outdated suggestions cannot be applied. This suggestion has been applied or marked resolved. Suggestions cannot be applied from pending reviews. Suggestions cannot be applied on multi-line comments. Suggestions cannot be applied while the pull request is queued to merge. Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is necessary; your other tests are sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function in itself can also be potentially used for things other than correlation as it is a specific type of transformation. Correlation is one use case of transforming to these codes, so to me it seems like this function should be anyway tested for what it is supposed to do irrespective of its use in correlation. Please lmk what do you think.