28

Is there an option not to drop the indices with NaN in them? I think silently dropping these rows from the pivot will at some point cause someone serious pain.

import pandas import numpy a = [['a', 'b', 12, 12, 12], ['a', numpy.nan, 12.3, 233., 12], ['b', 'a', 123.23, 123, 1], ['a', 'b', 1, 1, 1.]] df = pandas.DataFrame(a, columns=['a', 'b', 'c', 'd', 'e']) df_pivot = df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum) print(df) print(df_pivot) 

Output:

 a b c d e 0 a b 12.00 12 12 1 a NaN 12.30 233 12 2 b a 123.23 123 1 3 a b 1.00 1 1 c d e a b a b 13.00 13 13 b a 123.23 123 1 

2 Answers 2

24

This is currently not supported, see this issue for the enhancement: https://github.com/pydata/pandas/issues/3729.

Workaround to fill the index with a dummy, pivot, and replace

In [28]: df = df.reset_index() In [29]: df['b'] = df['b'].fillna('dummy') In [30]: df['dummy'] = np.nan In [31]: df Out[31]: a b c d e dummy 0 a b 12.00 12 12 NaN 1 a dummy 12.30 233 12 NaN 2 b a 123.23 123 1 NaN 3 a b 1.00 1 1 NaN In [32]: df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum) Out[32]: c d e a b a b 13.00 13 13 dummy 12.30 233 12 b a 123.23 123 1 In [33]: df.pivot_table(index=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum).reset_index().replace('dummy',np.nan).set_index(['a','b']) Out[33]: c d e a b a b 13.00 13 13 NaN 12.30 233 12 b a 123.23 123 1 
Sign up to request clarification or add additional context in comments.

3 Comments

Maybe someone could inject a warning message when there are nan values in the index? I don't see that it needs to be "supported" really. Manually filling is fine, you just have to know that it needs to be done.
The problem is that this is a 'feature', in that when you groupby and have a NaN it is excluded; I supposed you could have an option that controls this (and by default is false); and/or raises
I agree but I can't imagine a warning would break anybody's notion of the feature. You could even have a flag in pivot_table to not print the warning. I'm just worried about safety.
6

Currently the option "dropna=False" is supported by pivot_table:

df.pivot_table(rows=['a', 'b'], values=['c', 'd', 'e'], aggfunc=sum, dropna=False)

2 Comments

I tried this but it is not working. tested with pandas 1.3.0. this is not working with indexes. But it works for columns ie if one of the field in values (c,d,e for your case) contains all NaN values
tried with 1.4.0 and don't work with nan in indexes

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.