3

I have a dataframe:

 Isolate1 Isolate2 Isolate3 Isolate4 2 NaN NaN AGTCTA AGT 5 NaN GC NaN NaN 

And want to replace the NaN values in the Isolate1 column with dashes, one dash for each letter in the non NaN values from the other columns (or the maximum number if other column has other different value), ending in something like these:

 Isolate1 Isolate2 Isolate3 Isolate4 2 ------ NaN AGTCTA AGT 5 -- GC NaN NaN 

I have tried the following:

index_sizes_to_replace = {} for row in df.itertuples(): indel_sizes = [] #0 pos is index for i, value in enumerate(row[1:]): if pd.notnull(value): indel_sizes.append((i, len(value))) max_size = max([size for i, size in indel_sizes]) index_sizes_to_replace[row[0]]= max_size 

Now I have the number of dashes to replace the NaN values, but dont know how to do the filling, tried this:

for index, size in index_sizes_to_replace.iteritems(): df.iloc[index].fillna("-"*size, inplace=True) 

But didnt work, any suggestion?

3 Answers 3

6

Let's try:

import pandas as pd import numpy as np data = dict(Isolate1=[np.NaN,np.NaN,'A'], Isolate2=[np.NaN,'ABC','A'], Isolate3=['AGT',np.NaN,'A'], Isolate4=['AGTCTA',np.NaN,'A']) df = pd.DataFrame(data) 

Original solution:

df['Isolate1'] = df.apply(lambda x: '-' * x.str.len().max().astype(int), axis=1) 

To ignore Isolate1:

df['Isolate1'] = df.iloc[:,1:].apply(lambda x: x.str.len().max().astype(int)*'-', axis=1) 

Output:

 Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 - A A A 

@Anton vBR Edit to handle not nan in col1.

# Create a mask m = pd.isna(df['Isolate1']) df.loc[m,'Isolate1'] = df[m].apply(lambda x: '-' * x.str.len().max().astype(int), axis=1) 

Output:

 Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 A A A A 
Sign up to request clarification or add additional context in comments.

10 Comments

this is a nice one ! :)
I think this answer fits OP's needs better!
@AntonvBR, thanks for the warning, fortunately that case doesnt happen
@ScottBoston, but what happens if the isolate1 column does not have a NaN value? would it be replaced to dashes accordingly to the length of the values of other columns, right? Can the apply only be made to NaN values?
@AntonvBR, modified the function to include the name of the column, rather than its position, adding it as a parameter to make it more general, thank you very much
|
6

It looks bit ugly, but it does the trick:

import pandas as pd import numpy as np data = dict(Isolate1=[np.NaN,np.NaN], Isolate2=[np.NaN,'GC'], Isolate3=['AGTCTA',np.NaN], Isolate4=['AGT',np.NaN]) df = pd.DataFrame(data) df['Isolate1'] = (df.drop('Isolate1',1).ffill(axis=1).bfill(axis=1) .iloc[:,0].replace('.', '-', regex=True)) print(df) 

Returns

 Isolate1 Isolate2 Isolate3 Isolate4 2 ------ NaN AGTCTA AGT 5 -- GC NaN NaN 

8 Comments

@Anton vBR, thank you for editing! Don't you use pd.read_clipboard() ?
I do, but I like to fix the data sometimes. :)
Wow, can you explain a little bit for the replace('.', '-', regex=True) part ?
no, it does the trick, the maximum vlue was just my approximation to do it as didnt know how to count the number of dashes
@Wen, Max, I've come up with something different, what do you two think?
|
4

Setup

df Isolate1 Isolate2 Isolate3 Isolate4 0 NaN NaN AGT AGTCTA 1 NaN ABC NaN NaN 2 A A A A 

Solution
Using fillna + apply + str.__mul__:

df['Isolate1'] = df.Isolate1.fillna( df.fillna('').applymap(len).max(1).apply('-'.__mul__) ) Isolate1 Isolate2 Isolate3 Isolate4 0 ------ NaN AGT AGTCTA 1 --- ABC NaN NaN 2 A A A A 

2 Comments

Nice dude :-) !
The fillna with empty strings is necessary to avoid errors due to the type of NaN, right?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.