How to fill NaN values in a pandas dataframe, with variable values?

Question

I have a dataframe:

 Isolate1 Isolate2 Isolate3 Isolate4 2 NaN NaN AGTCTA AGT 5 NaN GC NaN NaN

And want to replace the NaN values in the Isolate1 column with dashes, one dash for each letter in the non NaN values from the other columns (or the maximum number if other column has other different value), ending in something like these:

 Isolate1 Isolate2 Isolate3 Isolate4 2 ------ NaN AGTCTA AGT 5 -- GC NaN NaN

I have tried the following:

index_sizes_to_replace = {} for row in df.itertuples(): indel_sizes = [] #0 pos is index for i, value in enumerate(row[1:]): if pd.notnull(value): indel_sizes.append((i, len(value))) max_size = max([size for i, size in indel_sizes]) index_sizes_to_replace[row[0]]= max_size

Now I have the number of dashes to replace the NaN values, but dont know how to do the filling, tried this:

for index, size in index_sizes_to_replace.iteritems(): df.iloc[index].fillna("-"*size, inplace=True)

But didnt work, any suggestion?

Anton vBR · Accepted Answer · 2018-02-22 21:22:25Z

Let's try:

import pandas as pd import numpy as np data = dict(Isolate1=[np.NaN,np.NaN,'A'], Isolate2=[np.NaN,'ABC','A'], Isolate3=['AGT',np.NaN,'A'], Isolate4=['AGTCTA',np.NaN,'A']) df = pd.DataFrame(data)

Original solution:

df['Isolate1'] = df.apply(lambda x: '-' * x.str.len().max().astype(int), axis=1)

To ignore Isolate1:

df['Isolate1'] = df.iloc[:,1:].apply(lambda x: x.str.len().max().astype(int)*'-', axis=1)

Output:

 Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 - A A A

@Anton vBR Edit to handle not nan in col1.

# Create a mask m = pd.isna(df['Isolate1']) df.loc[m,'Isolate1'] = df[m].apply(lambda x: '-' * x.str.len().max().astype(int), axis=1)

Output:

 Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 A A A A

@AntonvBR, thanks for the warning, fortunately that case doesnt happen
@ScottBoston, but what happens if the isolate1 column does not have a NaN value? would it be replaced to dashes accordingly to the length of the values of other columns, right? Can the apply only be made to NaN values?
@AntonvBR, modified the function to include the name of the column, rather than its position, adding it as a parameter to make it more general, thank you very much

Anton vBR · Accepted Answer · 2018-02-22 20:27:44Z

6

It looks bit ugly, but it does the trick:

import pandas as pd import numpy as np data = dict(Isolate1=[np.NaN,np.NaN], Isolate2=[np.NaN,'GC'], Isolate3=['AGTCTA',np.NaN], Isolate4=['AGT',np.NaN]) df = pd.DataFrame(data) df['Isolate1'] = (df.drop('Isolate1',1).ffill(axis=1).bfill(axis=1) .iloc[:,0].replace('.', '-', regex=True)) print(df)

Returns

 Isolate1 Isolate2 Isolate3 Isolate4 2 ------ NaN AGTCTA AGT 5 -- GC NaN NaN

edited Feb 22, 2018 at 20:27

Anton vBR

19k6 gold badges47 silver badges47 bronze badges

answered Feb 22, 2018 at 20:25

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

8 Comments

MaxU - stand with Ukraine Over a year ago

@Anton vBR, thank you for editing! Don't you use pd.read_clipboard() ?

Anton vBR Over a year ago

I do, but I like to fix the data sometimes. :)

BENY Over a year ago

Wow, can you explain a little bit for the replace('.', '-', regex=True) part ?

Rafael Rios Over a year ago

no, it does the trick, the maximum vlue was just my approximation to do it as didnt know how to count the number of dashes

cs95 Over a year ago

@Wen, Max, I've come up with something different, what do you two think?

|

cs95 · Accepted Answer · 2018-02-22 22:33:28Z

Setup

df Isolate1 Isolate2 Isolate3 Isolate4 0 NaN NaN AGT AGTCTA 1 NaN ABC NaN NaN 2 A A A A

Solution
Using fillna + apply + str.__mul__:

df['Isolate1'] = df.Isolate1.fillna( df.fillna('').applymap(len).max(1).apply('-'.__mul__) ) Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 A A A A

The fillna with empty strings is necessary to avoid errors due to the type of NaN, right?

Collectives™ on Stack Overflow

How to fill NaN values in a pandas dataframe, with variable values?

3 Answers 3

10 Comments

8 Comments

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

8 Comments

2 Comments

Related