-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd df = pd.DataFrame( dict( Date=pd.period_range("2000-01-01", periods=10, name="Date"), Value=range(10), ), ) result1 = df[df.Date == "2000-01-06"] # Correct result2 = df.query("Date == '2000-01-06'") # Empty! # To show that the period type is the problem df.Date = df.Date.dt.to_timestamp() result3 = df[df.Date == "2000-01-06"] # Correct result4 = df.query("Date == '2000-01-06'") # CorrectIssue Description
I would expect query("Date == '2000-01-06'") to return the same regardless of whether the Date column was datetime or period, given that df.Date == "2000-01-06" returns the same in both cases.
Also, >=, <, etc. work as expected for the period column, it's just == that is wrong.
I dug a little deeper and it seems that query in this case is using .isin to evaluate ==, so the issue also shows in this code without any eval going on.
pd.period_range("2000-01-01", periods=3).isin(['2000-01-02']) # False, False, False pd.date_range("2000-01-01", periods=3).isin(['2000-01-02']) # False, True, False pd.period_range("2000-01-01", periods=3) == '2000-01-02' # False, True, FalseFrom what I can tell, isin will attempt to turn the passed list into a PeriodArray here:
pandas/pandas/core/arrays/datetimelike.py
Line 766 in 9be48ef
| values = type(self)._from_sequence(values) |
But it will fail because it can't work out the freq, so reverts to just doing element-wise equality which compares periods and strings and so everything is False.
Maybe a check here to see if this is PeriodArray and if so pass self.freq, or is there something like period_array_like that will copy the freq/dtype from one period array to a new one.
Is it worth questioning why .query() uses isin to handle == in the first place? I have no idea, I'm sure there's a good reason.
Expected Behavior
Clear from the above?
Installed Versions
pd.show_versions() is still buggy, doesn't work on my machine.
I'm on version 2.0.3.