python pandas: pivot_table silently drops indices with nans

Question

Is there an option not to drop the indices with NaN in them? I think silently dropping these rows from the pivot will at some point cause someone serious pain.

import pandas import numpy a = [['a', 'b', 12, 12, 12], ['a', numpy.nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]] df = pandas.DataFrame(a, columns=['a', 'b', 'c', 'd', 'e']) df_pivot = df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum) print(df) print(df_pivot)

Output:

 a b c d e 0 a b 12.00 12 12 1 a NaN 12.30 233 12 2 b a 123.23 123 1 3 a b 1.00 1 1 c d e a b a b 13.00 13 13 b a 123.23 123 1

Nwawel A Iroume · Accepted Answer · 2021-07-27 13:22:49Z

This is currently not supported, see this issue for the enhancement: https://github.com/pydata/pandas/issues/3729.

Workaround to fill the index with a dummy, pivot, and replace

In [28]: df = df.reset_index() In [29]: df['b'] = df['b'].fillna('dummy') In [30]: df['dummy'] = np.nan In [31]: df Out[31]: a b c d e dummy 0 a b 12.00 12 12 NaN 1 a dummy 12.30 233 12 NaN 2 b a 123.23 123 1 NaN 3 a b 1.00 1 1 NaN In [32]: df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum) Out[32]: c d e a b a b 13.00 13 13 dummy 12.30 233 12 b a 123.23 123 1 In [33]: df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).reset_index().replace('dummy',np.nan).set_index(['a','b']) Out[33]: c d e a b a b 13.00 13 13 NaN 12.30 233 12 b a 123.23 123 1

Maybe someone could inject a warning message when there are nan values in the index? I don't see that it needs to be "supported" really. Manually filling is fine, you just have to know that it needs to be done.
The problem is that this is a 'feature', in that when you groupby and have a NaN it is excluded; I supposed you could have an option that controls this (and by default is false); and/or raises
I agree but I can't imagine a warning would break anybody's notion of the feature. You could even have a flag in pivot_table to not print the warning. I'm just worried about safety.

Ferro · Accepted Answer · 2020-08-31 10:23:28Z

6

Currently the option "dropna=False" is supported by pivot_table:

df.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum, dropna=False)

answered Aug 31, 2020 at 10:23

Ferro

2,2722 gold badges17 silver badges23 bronze badges

2 Comments

Nwawel A Iroume Over a year ago

I tried this but it is not working. tested with pandas 1.3.0. this is not working with indexes. But it works for columns ie if one of the field in values (c,d,e for your case) contains all NaN values

Dario Colombotto Over a year ago

tried with 1.4.0 and don't work with nan in indexes

Collectives™ on Stack Overflow

python pandas: pivot_table silently drops indices with nans

2 Answers 2

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Linked

Related