-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: fix combine_first converting timestamp to int #35514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you post an example of what the change to a user would look like (e.g. what the 'regression' )is
@jreback master is inconsistent when the # master import pandas as pd import numpy as np df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]]) df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2]) expected = pd.Series([True, True, False], name=2) result1 = df1.combine_first(df2)[2] result2 = df2.combine_first(df1)[2] print(expected.dtype, result1.dtype, result2.dtype) >>>bool bool object# master import pandas as pd df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64") df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64") res1 = df1.combine_first(df2) res2 = df2.combine_first(df1) print(res1["a"].dtype, res2["a"].dtype) >>>int64 float64the current fix makes the function consistent but does make the result different from expected, in the first case now both ways we get a series of type # fix import pandas as pd import numpy as np df1 = pd.DataFrame([[np.nan, 3.0, True], [-4.6, np.nan, True], [np.nan, 7.0, False]]) df2 = pd.DataFrame([[-42.6, np.nan, True], [-5.0, 1.6, False]], index=[1, 2]) expected = pd.Series([True, True, False], name=2) result1 = df1.combine_first(df2)[2] result2 = df2.combine_first(df1)[2] print(expected.dtype, result1.dtype, result2.dtype) >>>bool object object# fix import pandas as pd df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64") df2 = pd.DataFrame({"a": [1, 4]}, dtype="int64") res1 = df1.combine_first(df2) res2 = df2.combine_first(df1) print(res1["a"].dtype, res2["a"].dtype) >>>float64 float64let me know your thought on this. |
| @nixphix this looks pretty good. can you merge master and we'll see if the CI passes |
afd4a70 to 457a0ab Compare | This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
jreback left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. if you can parameterize those tests and ping on green.
| result = df1.combine_first(df2)[2] | ||
| expected = Series([True, True, False], name=2) | ||
| tm.assert_series_equal(result, expected) | ||
| expected1 = pd.Series([True, True, False], name=2, dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this test (e.g. put the df1 and df2 in the parameter along with the expected)
| def test_combine_first_int(self): | ||
| # GH14687 - integer series that do no align exactly | ||
| | ||
| df1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="int64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nixphix lgtm, can you merge master and address the test comments, ping on green.
| res = df1.combine_first(df2) | ||
| tm.assert_frame_equal(res, df1) | ||
| assert res["a"].dtype == "int64" | ||
| exp1 = pd.DataFrame({"a": [0, 1, 3, 5]}, dtype="float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and make a seaparte test
| this PR is prob ok, just needs a rebase and updating for comments. |
| ], | ||
| ) | ||
| def test_combine_first_timestamp_bug(val1, val2, nulls_fixture): | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here "# GH#35514"
| Closing in favor of #38145 |
black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffThis fix introduced two regression, but it appears like the fix only made the API consistent. Previously the failing regressions where inconsistent say
df1.combine_first(df2)would not return the same result asdf2.combine_first(df1)for the failing cases, more on these in the code comments.Let me know if there is a better way to handle this.