How to replace NaN values in a dataframe column

Question

I have a Pandas Dataframe as below:

 itm Date Amount 67 420 2012-09-30 00:00:00 65211 68 421 2012-09-09 00:00:00 29424 69 421 2012-09-16 00:00:00 29877 70 421 2012-09-23 00:00:00 30990 71 421 2012-09-30 00:00:00 61303 72 485 2012-09-09 00:00:00 71781 73 485 2012-09-16 00:00:00 NaN 74 485 2012-09-23 00:00:00 11072 75 485 2012-09-30 00:00:00 113702 76 489 2012-09-09 00:00:00 64731 77 489 2012-09-16 00:00:00 NaN

When I try to apply a function to the Amount column, I get the following error:

ValueError: cannot convert float NaN to integer

I have tried applying a function using math.isnan, pandas' .replace method, .sparse data attribute from pandas 0.9, if NaN == NaN statement in a function; I have also looked at this Q/A; none of them works.

How do I do it?

wjandrea · Accepted Answer · 2024-04-20 04:12:22Z

1020

DataFrame.fillna() or Series.fillna() will do this for you.

Example:

In [7]: df Out[7]: 0 1 0 NaN NaN 1 -0.494375 0.570994 2 NaN NaN 3 1.876360 -0.229738 4 NaN NaN In [8]: df.fillna(0) Out[8]: 0 1 0 0.000000 0.000000 1 -0.494375 0.570994 2 0.000000 0.000000 3 1.876360 -0.229738 4 0.000000 0.000000

To fill the NaNs in only one column, select just that column.

In [12]: df[1] = df[1].fillna(0) In [13]: df Out[13]: 0 1 0 NaN 0.000000 1 -0.494375 0.570994 2 NaN 0.000000 3 1.876360 -0.229738 4 NaN 0.000000

Or you can use the built in column-specific functionality:

df = df.fillna({1: 0})

edited Apr 20, 2024 at 4:12

wjandrea

34k10 gold badges69 silver badges105 bronze badges

answered Nov 8, 2012 at 18:54

Aman

47.4k7 gold badges38 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

max Over a year ago

Is it guaranteed that df[1] is a view rather than a copy of the original DF? Obviously, if there's a rare situation where it's a copy, it would cause a super-troublesome bug. Is there a clear statement on that in pandas documentation?

Aman Over a year ago

@max See this, might address your question: stackoverflow.com/questions/23296282/…

max Over a year ago

Thanks. Is my understanding correct that in that answer an "indexer that sets" is the outermost indexing operation (executed just before the assignment. So any assignment that only uses a single indexer is guaranteed to be safe, making your code safe?

Stefan Falk Over a year ago

Why is this not working for me? see: stackoverflow.com/questions/39452095/how-to-fillna-with-value-0

Sip Over a year ago

the last example throws a SettingWithCopyWarning

|

Asclepius · Accepted Answer · 2018-10-07 19:25:30Z

200

It is not guaranteed that the slicing returns a view or a copy. You can do

df['column'] = df['column'].fillna(value)

edited Oct 7, 2018 at 19:25

Asclepius

64.6k20 gold badges188 silver badges164 bronze badges

answered Oct 6, 2016 at 9:10

rakesh

4,5181 gold badge21 silver badges13 bronze badges

2 Comments

TimCera Over a year ago

Just discovered the "inplace=True" problem. This answer avoids the issue and I think is the cleanest solution presented.

wjandrea Over a year ago

What slicing? Is this meant to be a reply to Aman's answer? If so, I just edited Aman's answer to avoid inplace, so this is no longer needed.

Anton Protopopov · Accepted Answer · 2017-06-15 05:11:20Z

71

You could use replace to change NaN to 0:

import pandas as pd import numpy as np # for column df['column'] = df['column'].replace(np.nan, 0) # for whole dataframe df = df.replace(np.nan, 0) # inplace df.replace(np.nan, 0, inplace=True)

answered Jun 15, 2017 at 5:11

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

2 Comments

Shyam Bhimani Over a year ago

Will it only replace NaN ? or it will also replace value where NA or NaN like df.fillna(0)? I am looking for solution which only replace value where there is NaN and not NA

Anton Protopopov Over a year ago

@ShyamBhimani it should replace only NaN i.e. values where np.isnan is True

Petter Friberg · Accepted Answer · 2016-09-13 21:13:37Z

31

The below code worked for me.

import pandas df = pandas.read_csv('somefile.txt') df = df.fillna(0)

edited Sep 13, 2016 at 21:13

Petter Friberg

21.8k10 gold badges67 silver badges116 bronze badges

answered Sep 13, 2016 at 20:59

Cornel Ciobanu

5517 silver badges17 bronze badges

1 Comment

Siraj Alam Over a year ago

One liner df = pandas.read_csv('somefile.txt').fillna(0)

wjandrea · Accepted Answer · 2024-04-20 04:26:44Z

I just wanted to provide a special case. If you're using a multi-index or otherwise using an index-slicer, the inplace=True option may not be enough to update the slice you've chosen. For example in a 2x2 level multi-index this will not change any values (as of pandas 0.15):

idx = pd.IndexSlice df.loc[idx[:,mask_1], idx[mask_2,:]].fillna(value=0, inplace=True)

The "problem" is that the chaining breaks the fillna ability to update the original dataframe. I put "problem" in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

The solution is DataFrame.update:

df.update(df.loc[idx[:,mask_1], idx[[mask_2],:]].fillna(value=0))

It's one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

If anybody can find places this doesn't work please post in the comments, I've been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

Beside the point, but are they supposed to be different, idx[mask_2,:] and idx[[mask_2],:] ?
@wjandrea not 100% sure I understand what you're referring to since you seem to have edited it already so I can't see what it used to be but this is a 2x2 level multi-index so masks could be different
You can look at the revision history to see what changed; click where it says "edited" to see it. I didn't edit the part I'm asking about. If it helps clarify, I mean in one place you wrote idx[mask_2,:] and in the other place you wrote idx[[mask_2],:], but it's not clear why.
@wjandrea Thanks for the education on the revision history. Just two different examples from some working code. I don't remember what required the difference at the time, could prob be eliminated.

Farrukh Faizy · Accepted Answer · 2019-07-16 20:02:37Z

You can also use dictionaries to fill NaN values of the specific columns in the DataFrame rather to fill all the DF with some oneValue.

import pandas as pd df = pd.read_excel('example.xlsx') df.fillna( { 'column1': 'Write your values here', 'column2': 'Write your values here', 'column3': 'Write your values here', 'column4': 'Write your values here', . . . 'column-n': 'Write your values here'} , inplace=True)

Martin · Accepted Answer · 2018-07-07 19:03:15Z

Easy way to fill the missing values:-

filling string columns: when string columns have missing values and NaN values.

df['string column name'].fillna(df['string column name'].mode().values[0], inplace = True)

filling numeric columns: when the numeric columns have missing values and NaN values.

df['numeric column name'].fillna(df['numeric column name'].mean(), inplace = True)

filling NaN with zero:

df['column name'].fillna(0, inplace = True)

wjandrea · Accepted Answer · 2024-04-20 04:43:20Z

Considering the particular column Amount in the above table is of integer type, the following would be a solution:

df['Amount'] = df['Amount'].fillna(0).astype(int)

Similarly, you can fill it with various data types like float, str and so on.

In particular, I would consider datatype to compare various values of the same column.

wjandrea · Accepted Answer · 2024-04-20 04:40:52Z

To replace na values in pandas

df['column_name'].fillna(value_to_be_replaced, inplace=True)

if inplace=False, instead of updating the df (dataframe) it will return the modified values.

wjandrea · Accepted Answer · 2024-04-20 04:30:38Z

To replace nan in different columns with different ways:

replacement = {'column_A': 0, 'column_B': -999, 'column_C': -99999} df.fillna(value=replacement)

Ankhnesmerira · Accepted Answer · 2021-03-31 03:24:29Z

5

This works for me, but no one's mentioned it. could there be something wrong with it?

df.loc[df['column_name'].isnull(), 'column_name'] = 0

answered Mar 31, 2021 at 3:24

Ankhnesmerira

1,43019 silver badges35 bronze badges

2 Comments

Leo Ma Over a year ago

only this worked for me :D , df.fillna(0) didn't work (changed nothing with no error)

wjandrea Over a year ago

@LeoMa .fillna on its own doesn't save the result. You either need to save it yourself, like df = df.fillna(0) or use inplace: df.fillna(0, inplace=True)

wjandrea · Accepted Answer · 2024-04-20 04:51:54Z

There are two options available primarily; in case of imputation or filling of missing values NaN / np.nan with only numerical replacements (across column(s):

df['Amount'].fillna(value=None, method= ,axis=1,) is sufficient:

From the Documentation:

value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.

Which means 'strings' or 'constants' are no longer permissable to be imputed.

For more specialized imputations use SimpleImputer():

from sklearn.impute import SimpleImputer si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value') df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])

wjandrea · Accepted Answer · 2024-04-20 04:55:42Z

If you want to fill NaN for a specific column you can use loc:

d1 = {"Col1": ['A', 'B', 'C'], "fruits": ['Avocado', 'Banana', 'NaN']} d1 = pd.DataFrame(d1)

output:

 Col1 fruits 0 A Avocado 1 B Banana 2 C NaN

d1.loc[d1.Col1=='C', 'fruits'] = 'Carrot'

output:

 Col1 fruits 0 A Avocado 1 B Banana 2 C Carrot

rubengavidia0x · Accepted Answer · 2022-03-29 22:38:01Z

I think it's also worth mention and explain the parameters configuration of fillna() like Method, Axis, Limit, etc.

From the documentation we have:

Series.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None) Fill NA/NaN values using the specified method.

Parameters

value [scalar, dict, Series, or DataFrame] Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). Values not in the dict/Series/DataFrame will not be filled. This value cannot be a list. method [{‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None] Method to use for filling holes in reindexed Series pad / ffill: propagate last valid observation forward to next valid backfill / bfill: use next valid observation to fill gap axis [{0 or ‘index’}] Axis along which to fill missing values. inplace [bool, default False] If True, fill in-place. Note: this will modify any other views on this object (e.g., a no-copy slice for a column in a DataFrame). limit [int,defaultNone] If method is specified, this is the maximum number of consecutive NaN values to forward/backward fill. In other words, if there is a gap with more than this number of consecutive NaNs, it will only be partially filled. If method is not specified, this is the maximum number of entries along the entire axis where NaNs will be filled. Must be greater than 0 if not None. downcast [dict, default is None] A dict of item->dtype of what to downcast if possible, or the string ‘infer’ which will try to downcast to an appropriate equal type (e.g. float64 to int64 if possible).

Ok. Let's start with the method= Parameter this have forward fill (ffill) and backward fill(bfill) ffill is doing copying forward the previous non missing value.

e.g. :

import pandas as pd import numpy as np inp = [{'c1':10, 'c2':np.nan, 'c3':200}, {'c1':np.nan,'c2':110, 'c3':210}, {'c1':12,'c2':np.nan, 'c3':220},{'c1':12,'c2':130, 'c3':np.nan},{'c1':12,'c2':np.nan, 'c3':240}] df = pd.DataFrame(inp) c1 c2 c3 0 10.0 NaN 200.0 1 NaN 110.0 210.0 2 12.0 NaN 220.0 3 12.0 130.0 NaN 4 12.0 NaN 240.0

Forward fill:

df.fillna(method="ffill") c1 c2 c3 0 10.0 NaN 200.0 1 10.0 110.0 210.0 2 12.0 110.0 220.0 3 12.0 130.0 220.0 4 12.0 130.0 240.0

Backward fill:

df.fillna(method="bfill") c1 c2 c3 0 10.0 110.0 200.0 1 12.0 110.0 210.0 2 12.0 130.0 220.0 3 12.0 130.0 240.0 4 12.0 NaN 240.0

The Axis Parameter help us to choose the direction of the fill:

Fill directions:

ffill:

Axis = 1 Method = 'ffill' -----------> direction df.fillna(method="ffill", axis=1) c1 c2 c3 0 10.0 10.0 200.0 1 NaN 110.0 210.0 2 12.0 12.0 220.0 3 12.0 130.0 130.0 4 12.0 12.0 240.0 Axis = 0 # by default Method = 'ffill' | | # direction | V e.g: # This is the ffill default df.fillna(method="ffill", axis=0) c1 c2 c3 0 10.0 NaN 200.0 1 10.0 110.0 210.0 2 12.0 110.0 220.0 3 12.0 130.0 220.0 4 12.0 130.0 240.0

bfill:

axis= 0 method = 'bfill' ^ | | | df.fillna(method="bfill", axis=0) c1 c2 c3 0 10.0 110.0 200.0 1 12.0 110.0 210.0 2 12.0 130.0 220.0 3 12.0 130.0 240.0 4 12.0 NaN 240.0 axis = 1 method = 'bfill' <----------- df.fillna(method="bfill", axis=1) c1 c2 c3 0 10.0 200.0 200.0 1 110.0 110.0 210.0 2 12.0 220.0 220.0 3 12.0 130.0 NaN 4 12.0 240.0 240.0 # alias: # 'fill' == 'pad' # bfill == backfill

limit parameter:

df c1 c2 c3 0 10.0 NaN 200.0 1 NaN 110.0 210.0 2 12.0 NaN 220.0 3 12.0 130.0 NaN 4 12.0 NaN 240.0

Only replace the first NaN element across columns:

df.fillna(value = 'Unavailable', limit=1) c1 c2 c3 0 10.0 Unavailable 200.0 1 Unavailable 110.0 210.0 2 12.0 NaN 220.0 3 12.0 130.0 Unavailable 4 12.0 NaN 240.0 df.fillna(value = 'Unavailable', limit=2) c1 c2 c3 0 10.0 Unavailable 200.0 1 Unavailable 110.0 210.0 2 12.0 Unavailable 220.0 3 12.0 130.0 Unavailable 4 12.0 NaN 240.0

downcast parameter:

df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 c1 4 non-null float64 1 c2 2 non-null float64 2 c3 4 non-null float64 dtypes: float64(3) memory usage: 248.0 bytes df.fillna(method="ffill",downcast='infer').info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5 entries, 0 to 4 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 c1 5 non-null int64 1 c2 4 non-null float64 2 c3 5 non-null int64 dtypes: float64(1), int64(2) memory usage: 248.0 bytes

cottontail · Accepted Answer · 2024-11-28 04:29:51Z

If you're reading data with missing values from a file using read_csv etc., then you can pass keep_default_na=False to read missing values as empty strings (""). In specific cases, this is useful because it achieves what fillna or replace does in one function call (with one less copy in memory).

df = pd.read_csv(filepath, keep_default_na=False) # the above is same as df = pd.read_csv(filepath).fillna("") # or df = pd.read_csv(filepath).replace(np.nan, "")

If the dataframe contains numbers, then you can pass dtypes to read_csv to construct a dataframe with the desired dtype columns.

df = pd.read_csv(filepath, keep_default_na=False, dtype={"col1": "Int64", "col2": "string", "col3": "Float64"})

Another way to replace NaN is via mask()/where() methods. They are similar methods where mask replaces values that satisfy the condition whereas where replaces values that do not satisfy the condition. So to use, we just have to filter the NaN values and replace them with the desired value.

import pandas as pd df = pd.DataFrame({'a': [1, float('nan'), float('nan')], 'b': [float('nan'), 'a', 'b']}) df = df.where(df.notna(), 10) # for the entire dataframe df['a'] = df['a'].where(df['a'].notna(), 10) # for a single column

The advantage of this method is that we can conditionally replace NaN values with it. The following is an example where NaN values in df are replaced by 10 if the condition cond is satisfied.

cond = pd.DataFrame({'a': [True, True, False], 'b':[False, True, True]}) df = df.mask(df.isna() & cond, 10)

Under the hood, fillna() calls where() (source) which in turn calls numpy.where() if the dataframe is small and numexpr.evaluate if it's large (source). So fillna/mask/where are essentially the same method for the purposes of replacing NaN values. On the other hand, replace() (another method given on this page) is a numpy.putmask operation (source). Because numexpr is a faster than numpy for large arrays, for very large dataframes, replace may be outperformed by the other methods.

On a tangential note, it's common for a dataframe to have a literal string 'NaN' instead of an actual NaN value. To make sure that a dataframe indeed has NaN values, check with df.isna().any(). If it returns False, when it should contain NaN, then you probably have 'NaN' strings, in which case, use replace to convert them into NaN or, even better, replace with the value you're meant to replace it with. For example:

df = pd.DataFrame({'a': ['a', 'b', 'NaN']}) df = df.replace('NaN', 'c')

NaN appearing in a data frame is not a string, so the appropriate answer would be - << df = df(np.nan, 0) >> Anyways, Aman's answer using DataFrame.fillna() is the correct one.
@Malgo could you elaborate what ”df = df(np.nan, 0)” is (because it’s missing the method call)? I agree that fillna is the canonical method; I just wanted to elaborate on how it ties to other pandas methods, namely where.
Sorry I missed typed earlier, I meant - < df = df.replace(np.nan, 0) >>

Deb · Accepted Answer · 2024-04-20 03:23:38Z

Using lambda expression, it is also possible to replace NaN with 0.

Below is an example:

dss3 = dss2['Score'].apply(lambda x: 0 if dss2['Score'].isnull else x) print(dss3)

Collectives™ on Stack Overflow

How to replace NaN values in a dataframe column

16 Answers 16

7 Comments

2 Comments

2 Comments

1 Comment

4 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

3 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

16 Answers 16

7 Comments

2 Comments

2 Comments

1 Comment

4 Comments

Comments

Comments

Comments

Comments

Comments

2 Comments

1 Comment

Comments

Comments

3 Comments

Comments

Linked

Related