-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
[DEPR]: Deprecate setting nans in categories #10929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DEPR]: Deprecate setting nans in categories #10929
Conversation
pandas/core/categorical.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to change the setting of categories in the fastpath section (near top of __init__). and add a fastpath to avoid this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastpath here goes thru set_categories which calls ._set_categories. It still needs to go through the validate part so I'm not really sure what it's fastpathing.
All the warnings are caught in the test_categorical. There were quite a few.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's my point - if I am fast pathing I don't need any validation at all (so I want to call _set_categories) to assign them but not validate
rather than
self._categories = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think fastpath is being misused here. Some tests failed when I changed the constructor to do things properly (fastpath=True -> set categories directly, no validation). For one example (from the tests):
factor = Categorical([0,1,2,0,1,2]*100, ['a', 'b', 'c'], name='cat', fastpath=True)breaks since it's assumed elsewhere that categories is always an Index. Followup in a separate issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fast path is purely internal - the point is you already have things setup correctly and simply easy to fall the constructor
your example is not valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My example came from the test suite :)
There's about 8 tests that fail / error if I change the constructor to skip category validation when fastpath is true. Not sure if I'll be able to get to those today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm this extra validation is a problem
fast path creation of categorically is already slow
668ef9f to 3745ca4 Compare | @jreback cherry picked your PR, thanks for that. I can squash down to one commit if you want. I'll try to do a perf test today. |
| squash up to you. yeh just do a quick perf test. merge when green. |
doc/source/categorical.rst Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be on the previous line?
3745ca4 to ac84968 Compare | @jorisvandenbossche Addressed your 3 points. I also
running asv now. The only categorical test is in |
ac84968 to 8d87f3b Compare In [14]: %timeit Categorical(codes, cat_idx, fastpath=True) 100 loops, best of 3: 4.01 ms per loop In [15]: %timeit Categorical(values, cat_idx) 1 loops, best of 3: 279 ms per loop👍 I've got an asv benchmark here. Just wrote and pushed an |
| @TomAugspurger awesome yeh in dask |
| asv doesn't show much of a difference (assuming I ran this correctly). asv run 30f672c..8d87f3b --bench=categorical_constructor · Fetching recent changes · Creating environments · Discovering benchmarks ·· Uninstalling from py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt. ·· Building for py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt............................... ·· Installing into py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt.. · Running 4 total benchmarks (2 commits * 1 environments * 2 benchmarks) [ 0.00%] · For pandas commit hash 8d87f3be: [ 0.00%] ·· Building for py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt................................... [ 0.00%] ·· Benchmarking py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt [ 25.00%] ··· Running categoricals.categorical_constructor.time_fastpath 5.84ms [ 50.00%] ··· Running categoricals.categorical_constructor.time_regular_constructor 250.19ms [ 50.00%] · For pandas commit hash e757e8a8: [ 50.00%] ·· Building for py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt...................................... [ 50.00%] ·· Benchmarking py3.4-Cython-matplotlib-numexpr-numpy-openpyxl-scipy-sqlalchemy-tables-xlrd-xlwt [ 75.00%] ··· Running categoricals.categorical_constructor.time_fastpath 4.83ms [100.00%] ··· Running categoricals.categorical_constructor.time_regular_constructor 246.23msThose two hashes are |
| ok, merge away then |
[DEPR]: Deprecate setting nans in categories
| Merged, thanks. |
Closes dask#1565 For compatability with pandas-dev/pandas#10929 where it was decided that `pd.Categorical(['a', np.nan], categories=['a', np.nan])` Should raise a `FutureWarning`. Now we just drop missing values before computing the distincts for the categories.
Closes #1565 For compatability with pandas-dev/pandas#10929 where it was decided that `pd.Categorical(['a', np.nan], categories=['a', np.nan])` Should raise a `FutureWarning`. Now we just drop missing values before computing the distincts for the categories.
WIP still
Closes #10748
I have to run for now, but will pick this up later today.
I think I'm missing a few in the tests, since the warning is showing up in a bunch of places (there's a way to convert those to errors for testing right?)
I had to refactor a couple function that were setting
._categoriesdirectly instead of usingCategorical._set_categories. I kept that in a separate commit. Could do a bit more refactoring with thevalidate_categoriesstuff, but that can be separate.And I need to figure out the proper
stacklevelfor this warning. I think I used 3 for now.