Pandas Fill NaN with Column Values

Question

Given the following data frame:

import pandas as pd import numpy as np df = pd.DataFrame({'A':[1,1,np.nan], 'B':[2.2,np.nan,2.2]}) df A B 0 1.0 2.2 1 1.0 NaN 2 NaN 2.2

If I want to replace the NaN value in column A with the value that repeats in that column (1) and do the same for column B, what sort of fillna() do I need to use?

 A B 0 1.0 2.2 1 1.0 NaN 2 NaN 2.2

Looking for a generic solution as I really have thousands of rows. Thanks in advance!

Paul H · Accepted Answer · 2016-03-23 05:07:01Z

fillna can take dictionary of values where the key is the column name.

Assuming you want to fill the columns with the value that is repeated the most, you can compute the dictionary with:

df = pd.DataFrame({ 'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9] }) fill_dict = df.mode().to_dict(orient='records')[0] df = df.fillna(values=fill_dict) df A B 0 1 2.2 1 1 2.2 2 1 2.2 3 2 1.9

Colonel Beauvel · Accepted Answer · 2016-03-23 08:03:16Z

2

Why not simply:

df.fillna(method='ffill') # df = pd.DataFrame({'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9]}) # df.fillna(method='ffill') # A B #0 1 2.2 #1 1 2.2 #2 1 2.2 #3 2 1.9

answered Mar 23, 2016 at 8:03

Colonel Beauvel

31.3k11 gold badges49 silver badges88 bronze badges

5 Comments

Paul H Over a year ago

It's not clear from the OP if the fill value should just be the previous, or the most common value in the column.

Colonel Beauvel Over a year ago

hum i understood it as the previous value but you may be right.

Dance Party Over a year ago

From my example, this is the sort of simple answer I was looking for. However, what if I wanted to fill with the most common value? I'm ultimately trying to fill blanks resulting from a df.loc and groupby-transform procedure which leaves some row values blank

Dance Party Over a year ago

I'm going to post another related question in a minute that better summarizes what I'm trying to do.

csaladenes Over a year ago

Transpose first, then ffil or bfill: df.T.ffill().bfill().T

Dark Matter · Accepted Answer · 2016-03-23 05:14:13Z

import itertools import operator def most_common(L): # get an iterable of (item, iterable) pairs SL = sorted((x, i) for i, x in enumerate(L)) # print 'SL:', SL groups = itertools.groupby(SL, key=operator.itemgetter(0)) # auxiliary function to get "quality" for an item def _auxfun(g): item, iterable = g count = 0 min_index = len(L) for _, where in iterable: count += 1 min_index = min(min_index, where) # print 'item %r, count %r, minind %r' % (item, count, min_index) return count, -min_index # pick the highest-count/earliest item return max(groups, key=_auxfun)[0]

and then just add

df['A'].fillna(most_common(df['A'].values.tolist()))

I'm impressed with your implementation of mode with itertools. But even if we forget for a moment that pandas dataframes have their own mode method, using numpy's mode function would surely be much more reliable.
Ya I agree ... i just forgot about mode so I done this ... Dont know if its correct ans or not !!!!

Collectives™ on Stack Overflow

Pandas Fill NaN with Column Values

3 Answers 3

Comments

5 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

5 Comments

2 Comments

Related