-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
PERF: GH2003 Series.isin for categorical dtypes #20522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
19ac11a 54021b9 2514b45 80f687a d6c3953 33e3b07 ceffccd 3247dce 2b7b1c4 4478a49 64fef49 b25da12 9f8e790 60ac658 50aca26 713712e 18c827d 993afd8 a2b70ee fa7f0f1 7b680cd File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -148,3 +148,18 @@ def time_rank_int_cat(self): | |
| | ||
| def time_rank_int_cat_ordered(self): | ||
| self.s_int_cat_ordered.rank() | ||
| | ||
| | ||
| class IsIn(object): | ||
| | ||
| goal_time = 0.2 | ||
| | ||
| def setup(self): | ||
| n = 5 * 10**5 | ||
| sample_size = 100 | ||
| arr = ['s%04d' % i for i in np.random.randint(0, n // 10, size=n)] | ||
| self.sample = np.random.choice(arr, sample_size) | ||
| self.ts = pd.Series(arr).astype('category') | ||
| | ||
| def time_isin_categorical_strings(self): | ||
| ||
| self.ts.isin(self.sample) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -39,6 +39,8 @@ | |
| from pandas.util._decorators import ( | ||
| Appender, cache_readonly, deprecate_kwarg, Substitution) | ||
| | ||
| import pandas.core.algorithms as algorithms | ||
| | ||
| from pandas.io.formats.terminal import get_terminal_size | ||
| from pandas.util._validators import validate_bool_kwarg, validate_fillna_kwargs | ||
| from pandas.core.config import get_option | ||
| | @@ -2216,6 +2218,15 @@ def _concat_same_type(self, to_concat): | |
| def _formatting_values(self): | ||
| return self | ||
| | ||
| def isin(self, values): | ||
| from pandas.core.series import _sanitize_array | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a doc-string here Contributor Author There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure | ||
| values = _sanitize_array(values, None, None) | ||
| null_mask = isna(values) | ||
| code_values = self.categories.get_indexer(values) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger is there an EA version of this? (meaning API) (maybe should raise NotImplementedError which could be caught with a default implementation) Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There isn't yet, but I was thinking about add it. I actually define it on IPArray, since an IP address can be "in" a network. I'll open an issue (shouldn't delay the changes here). | ||
| code_values = code_values[null_mask | (code_values >= 0)] | ||
| return algorithms.isin(self.codes, code_values) | ||
| | ||
| | ||
| # The Series.cat accessor | ||
| | ||
| | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -3564,7 +3564,10 @@ def isin(self, values): | |
| 5 False | ||
| Name: animal, dtype: bool | ||
| """ | ||
| result = algorithms.isin(com._values_from_object(self), values) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I wonder if the Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes let's try to do this, @Ma3aXaKa can you make this change | ||
| if is_categorical_dtype(self): | ||
| result = self._values.isin(values) | ||
| ||
| else: | ||
| result = algorithms.isin(com._values_from_object(self), values) | ||
| return self._constructor(result, index=self.index).__finalize__(self) | ||
| | ||
| def between(self, left, right, inclusive=True): | ||
| | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| | @@ -47,3 +47,24 @@ def test_factorized_sort_ordered(): | |
| | ||
| tm.assert_numpy_array_equal(labels, expected_labels) | ||
| tm.assert_categorical_equal(uniques, expected_uniques) | ||
| | ||
| | ||
| def test_isin_cats(): | ||
| cat = pd.Categorical(["a", "b", np.nan]) | ||
| | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can u add the issue number here as a comment | ||
| result = cat.isin(["a", np.nan]) | ||
| expected = np.array([True, False, True], dtype=bool) | ||
| tm.assert_numpy_array_equal(expected, result) | ||
| | ||
| result = cat.isin(["a", "c"]) | ||
| expected = np.array([True, False, False], dtype=bool) | ||
| Contributor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there are a couple of tests in pandas/tests/test_algos.py that test cats for isin, can you move here | ||
| tm.assert_numpy_array_equal(expected, result) | ||
| | ||
| | ||
| @pytest.mark.parametrize("empty", [[], pd.Series(), np.array([])]) | ||
| def test_isin_empty(empty): | ||
| s = pd.Categorical(["a", "b"]) | ||
| expected = np.array([False, False], dtype=bool) | ||
| | ||
| result = s.isin(empty) | ||
| tm.assert_numpy_array_equal(expected, result) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can u make Isin