0

Given the following data frame:

import pandas as pd import numpy as np df = pd.DataFrame({'A':[1,1,np.nan], 'B':[2.2,np.nan,2.2]}) df A B 0 1.0 2.2 1 1.0 NaN 2 NaN 2.2 

If I want to replace the NaN value in column A with the value that repeats in that column (1) and do the same for column B, what sort of fillna() do I need to use?

 A B 0 1.0 2.2 1 1.0 NaN 2 NaN 2.2 

Looking for a generic solution as I really have thousands of rows. Thanks in advance!

3 Answers 3

3

fillna can take dictionary of values where the key is the column name.

Assuming you want to fill the columns with the value that is repeated the most, you can compute the dictionary with:

df = pd.DataFrame({ 'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9] }) fill_dict = df.mode().to_dict(orient='records')[0] df = df.fillna(values=fill_dict) df A B 0 1 2.2 1 1 2.2 2 1 2.2 3 2 1.9 
Sign up to request clarification or add additional context in comments.

Comments

2

Why not simply:

df.fillna(method='ffill') # df = pd.DataFrame({'A': [1, 1, np.nan, 2], 'B': [2.2, np.nan, 2.2, 1.9]}) # df.fillna(method='ffill') # A B #0 1 2.2 #1 1 2.2 #2 1 2.2 #3 2 1.9 

5 Comments

It's not clear from the OP if the fill value should just be the previous, or the most common value in the column.
hum i understood it as the previous value but you may be right.
From my example, this is the sort of simple answer I was looking for. However, what if I wanted to fill with the most common value? I'm ultimately trying to fill blanks resulting from a df.loc and groupby-transform procedure which leaves some row values blank
I'm going to post another related question in a minute that better summarizes what I'm trying to do.
Transpose first, then ffil or bfill: df.T.ffill().bfill().T
0
import itertools import operator def most_common(L): # get an iterable of (item, iterable) pairs SL = sorted((x, i) for i, x in enumerate(L)) # print 'SL:', SL groups = itertools.groupby(SL, key=operator.itemgetter(0)) # auxiliary function to get "quality" for an item def _auxfun(g): item, iterable = g count = 0 min_index = len(L) for _, where in iterable: count += 1 min_index = min(min_index, where) # print 'item %r, count %r, minind %r' % (item, count, min_index) return count, -min_index # pick the highest-count/earliest item return max(groups, key=_auxfun)[0] 

and then just add

df['A'].fillna(most_common(df['A'].values.tolist())) 

2 Comments

I'm impressed with your implementation of mode with itertools. But even if we forget for a moment that pandas dataframes have their own mode method, using numpy's mode function would surely be much more reliable.
Ya I agree ... i just forgot about mode so I done this ... Dont know if its correct ans or not !!!!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.