-
- Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Code Sample
pd.Series(list("abcde"), dtype="string").value_counts(normalize=True)--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-40-718821f804b4> in <module> 1 # lang.value_counts(normalize=True) ----> 2 pd.Series(list("abcde"), dtype="string").value_counts(normalize=True) ~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/base.py in value_counts(self, normalize, sort, ascending, bins, dropna) 1233 normalize=normalize, 1234 bins=bins, -> 1235 dropna=dropna, 1236 ) 1237 return result ~/anaconda/envs/grapy/lib/python3.7/site-packages/pandas/core/algorithms.py in value_counts(values, sort, ascending, normalize, bins, dropna) 729 730 if normalize: --> 731 result = result / float(counts.sum()) 732 733 return result AttributeError: 'IntegerArray' object has no attribute 'sum'Problem description
The problem seems to be that value_counts() on a string extension dtype returns an Int64 dtype, and sum is not implemented for IntegerArrays , although it is for Series with ExtensionArrays:
Expected Output
vc = pd.Series(list("abcde"), dtype="string").value_counts(normalize=False) print(vc) print(vc / vc.sum())d 1 e 1 b 1 c 1 a 1 dtype: Int64 d 0.2 e 0.2 b 0.2 c 0.2 a 0.2 dtype: float64May be related to #22843?
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Darwin
OS-release : 19.2.0
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8
pandas : 1.0.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.0
pip : 20.0.2
setuptools : 46.1.1.post20200322
Cython : None
pytest : 5.4.1
hypothesis : None
sphinx : 2.4.4
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.2.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.4.1
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.15
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.48.0