-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: SparseSeries.value_counts ignores fill_value #12835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
79164d6 to 56a415a Compare pandas/core/algorithms.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually with only a small rewriting I think you could do:
if com.is_extension_dtype(values) or com.is_datetimelike(values): result = values.values_counts(dropna) result.name = name else: ..... if we define DateTimeIndex.value_counts that properly handles localization before/after (instead of in here)
will be much cleaner.
Further it makes sense to define .value_counts() in tseries/base.py (and move the inference from core/base.py/value_counts.py there. To handle the dispatching of period/DatetimeIndex.
a bit of refactoring but moves functionaility where it should be.
81fac4e to 91c4c2f Compare | Found current master doesn't work for datetimetz. Let me check. |
89ab6ea to add609f Compare | Additionally fixing the other |
c14cbf5 to f1d62ee Compare | @sinhrks pls rebase when you can |
f1d62ee to 456a4a9 Compare Current coverage is 84.04%@@ master #12835 diff @@ ========================================== Files 136 136 Lines 49908 49931 +23 Methods 0 0 Messages 0 0 Branches 0 0 ========================================== + Hits 41876 41962 +86 + Misses 8032 7969 -63 Partials 0 0
|
pandas/core/algorithms.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this won't happen here as is_extension_type will do this
456a4a9 to 57f6667 Compare pandas/tests/test_groupby.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed @slow as groupby.value_coutns has separate logic depending on numpy version.
57f6667 to caeaed9 Compare caeaed9 to 2392e7c Compare | result = result.sort_values(ascending=ascending) | ||
| values = values.view(np.int64) | ||
| keys, counts = htable.value_count_scalar64(values, dropna) | ||
| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, though I still think we should handle in tseries/base, e.g. define DatetimeIndexOpsMixIN and just defer to it like we are doing for sparse/categorical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sparse/categorical value_counts only supports dropna kwds. It may better to be an internal method.
If we do the same thing using DatetimeIndexOpsMixin, the impl should be:
DatetimeIndexOpsMixin's publicvalue_countscallsalgorithms.py(as it needs to handle kwds likenormalize)algorithms.pyconvert datetime-likes to datetime-likeIndexto handleSeries- Calls
DatetimeIndexOpsMixin's internal_value_counts
Suppose to be little complex, or do you have anything in your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ok, it can ge tricky. ok, we'll keep in mind for later. ideally to put specific logic. merging.
git diff upstream/master | flake8 --diffAlso fixed an issue which categorical
value_countsresets name.