1

I have a dataframe:

import pandas as pd data = {'fruit': ['pear','pear','banana', 'pear', 'pear','apple', 'apple', 'cherry','cherry'], 'fruit_type': ['unknown','pear','unknown', 'unknown', 'pear','unknown', 'apple', 'cherry','unknown'], 'country': ['unknown','usa', 'unknown', 'unknown','ghana','unknown', 'russia', 'albania','unknown'], 'id': ['011','011','011', '011', '011','011', '011', '6','6'], 'month': ['unknown','march', 'unknown', 'unknown', 'january','unknown', 'march', 'january','unknown'] } df = pd.DataFrame(data, columns = ['fruit','fruit_type','country', 'id', 'month']) 

enter image description here

I want to fill rows where is unknown with value from another row for each group by id:

If we have an unknown value in the month column in the first place in the group by id we need to insert unknown values from the next row

If an unknown value in the month column not in the first place in the group by id we need to insert unknown values from the previous row

Can anyone see the problem?

Output dataframe:

enter image description here

1
  • 1
    what have you tried so far? Commented Jul 26, 2021 at 9:27

2 Answers 2

2

use replace() for replacing 'unknown' to NaN then groupby 'id' and then forward fill and then backword fill and filnally assign the result back to df:

df=df.replace('unknown',float('nan')) #If above replace doesn't work then use: #df=df.replace('unknown',float('nan'),regex=True) df=df.groupby('id').apply(lambda x:x.ffill().bfill()) 

output of df:

 fruit fruit_type country id month 0 pear pear usa 011 march 1 pear pear usa 011 march 2 banana pear usa 011 march 3 pear pear usa 011 march 4 pear pear ghana 011 january 5 apple pear ghana 011 january 6 apple apple russia 011 march 7 cherry cherry albania 6 january 8 cherry cherry albania 6 january 
Sign up to request clarification or add additional context in comments.

2 Comments

@jezrael how sir?pls tell!!
seems OK, sorry.
2

Replace unknown to missing values and then forward and backward missing values per groups:

f = lambda x: x.ffill().bfill() df = df.replace('unknown', np.nan).groupby(df['id']).transform(f) print (df) fruit fruit_type country id month 0 pear pear usa 011 march 1 pear pear usa 011 march 2 banana pear usa 011 march 3 pear pear usa 011 march 4 pear pear ghana 011 january 5 apple pear ghana 011 january 6 apple apple russia 011 march 7 cherry cherry albania 6 january 8 cherry cherry albania 6 january 

Comments