-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: Add axis argument to Dataframe.corr #35984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
f6ef933 3648aeb c588bc4 0dbde9d feecd8d 91c1e3e fb89cbe d2a87f8 725f36a f1c884b 0552fa4 3c4f88c 0f1e817 5871508 b98401b 8fc6cda f2e6e84 File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -20,6 +20,7 @@ | |
| TYPE_CHECKING, | ||
| Any, | ||
| AnyStr, | ||
| Callable, | ||
| Dict, | ||
| FrozenSet, | ||
| Hashable, | ||
| | @@ -5787,7 +5788,7 @@ def nsmallest(self, n, columns, keep="first") -> "DataFrame": | |
| population GDP alpha-2 | ||
| Tuvalu 11300 38 TV | ||
| Anguilla 11300 311 AI | ||
| Iceland 337000 17036 IS | ||
| Iceland 337000 17036 IS | ||
| | ||
| When using ``keep='last'``, ties are resolved in reverse order: | ||
| | ||
| | @@ -8116,9 +8117,14 @@ def _series_round(s, decimals): | |
| # ---------------------------------------------------------------------- | ||
| # Statistical methods, etc. | ||
| | ||
| def corr(self, method="pearson", min_periods=1) -> "DataFrame": | ||
| def corr( | ||
| self, | ||
| method: Union[str, Callable[[np.ndarray, np.ndarray], np.float64]] = "pearson", | ||
| min_periods: Optional[int] = 1, | ||
| axis: Union[str, int] = 0, | ||
| ) -> "DataFrame": | ||
| """ | ||
| Compute pairwise correlation of columns, excluding NA/null values. | ||
| Compute pairwise correlation of rows or columns, excluding NA/null values. | ||
| | ||
| Parameters | ||
| ---------- | ||
| | @@ -8140,6 +8146,12 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |
| to have a valid result. Currently only available for Pearson | ||
| and Spearman correlation. | ||
| | ||
| axis : {0 or 'index', 1 or 'columns'}, default 0 | ||
| The axis to use. 0 or 'index' to compute column-wise, 1 or 'columns' for | ||
| row-wise. | ||
kc611 marked this conversation as resolved. Show resolved Hide resolved | ||
| | ||
| .. versionadded:: 1.2.0 | ||
| | ||
| Returns | ||
| ------- | ||
| DataFrame | ||
| | @@ -8162,12 +8174,22 @@ def corr(self, method="pearson", min_periods=1) -> "DataFrame": | |
| dogs cats | ||
| dogs 1.0 0.3 | ||
| cats 0.3 1.0 | ||
| >>> df.corr(method=histogram_intersection, axis=1) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. blank line before Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. add a comment as well | ||
| 0 1 2 3 | ||
| 0 1.0 0.3 0.2 0.3 | ||
| 1 0.3 1.0 0.0 0.1 | ||
| 2 0.2 0.0 1.0 0.2 | ||
| 3 0.3 0.1 0.2 1.0 | ||
| """ | ||
| numeric_df = self._get_numeric_data() | ||
| cols = numeric_df.columns | ||
| axis = numeric_df._get_axis_number(axis) | ||
| cols = numeric_df._get_agg_axis(axis) | ||
| idx = cols.copy() | ||
| mat = numeric_df.to_numpy(dtype=float, na_value=np.nan, copy=False) | ||
| | ||
| if axis == 1: | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't we have to transpose the results? Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we do since the result is symmetric | ||
| mat = mat.transpose() | ||
| | ||
| if method == "pearson": | ||
| correl = libalgos.nancorr(mat, minp=min_periods) | ||
| elif method == "spearman": | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -175,6 +175,15 @@ def test_corr_int(self): | |
| df3.cov() | ||
| df3.corr() | ||
| | ||
| @td.skip_if_no_scipy | ||
| @pytest.mark.parametrize("meth", ["pearson", "spearman", "kendall"]) | ||
| def test_corr_axes(self, meth): | ||
| # https://github.com/pandas-dev/pandas/issues/35002 | ||
| df = pd.DataFrame(np.random.normal(size=(10, 4))) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put axis labels that are differnt for rows / columns and this should fail (need to handle that) | ||
| expected = df.T.corr(meth, axis=0) | ||
| Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's usually encouraged to explicitly write out the expected DataFrame so that Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I could do that but, wouldn't it just be a test for Dataframe.corr function itself. Since the original operations to be done on matrix itself are left unchanged. Personally I don't think explicitly writing Dataframe in this case is needed, unless (as you suggested) instead of taking a transpose we implement a workaround involving changing the main function itself. Member There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK sure, perhaps wait for others' comments then Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test comes close to being circular but I think it's probably okay here. In this case it's hard to explicitly construct the expected DataFrame for all methods "from scratch" without either trivial input data or messy juggling of different scipy functions. | ||
| result = df.corr(meth, axis=1) | ||
| tm.assert_frame_equal(result, expected) | ||
| | ||
| @td.skip_if_no_scipy | ||
| @pytest.mark.parametrize( | ||
| "nullable_column", [pd.array([1, 2, 3]), pd.array([1, 2, None])] | ||
| | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this signature to an alias and put it in pandas._typing, cal it MethodWithCallable