2

I have the following data frame

df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['1','aa','fff','flfl','ipip','my desc 2','',''], ['1','aa','mmm','rprp','','','',''], ['2','aa','ccc','rprp','','','my feature1 1',''], ['2','aa','fff','bubu','thth','my desc 3','',''], ['2','aa','mmm','fafa','rtrt','my desc 4','',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['3','aa','fff','arar','amam','my desc 6','',''], ['3','aa','mmm','acac','ryry','my desc 7','',''],['4','bb','coco','rere','','','','my feature2 3'], ['4','bb','inin','mimi','rere','my desc 8','',''], ['4','bb','itit','toto','enen','my desc 9','',''], ['4','bb','spsp','glgl','pepe','my desc 10','',''], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3',''], ['5','bb','inin','rere','','','',''],['5','bb','itit','toto','hrhr','my desc 12','',''], ['5','bb','spsp','glgl','lolo','my desc 13','','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2']) 

Now, I need to delete row to get the below output.

df = pd.DataFrame([['1','aa','ccc','rere','thth','my desc 1','','my feature2 1'], ['2','aa','ccc','rprp','','my desc 3','my feature1 1',''], ['3','aa','ccc','blbl','thth','my desc 5','my feature1 2','my feature2 2'], ['4','bb','coco','rere','','my desc 8','','my feature2 3'], ['5','bb','coco','baba','mpmp','my desc 11','my feature1 3','']], columns=['foo', 'bar','name_input','value_input','bulb','desc','feature1', 'feature2']) 

I tried the below. And none of them seem to work.

df= df.dropna(subset=['feature1', 'feature2']) df.dropna(thresh=5, axis=0, inplace=True) df= df[df.feature2.notnull()] df= df[pd.notnull(df[['feature1', 'feature2']])] 

Any help is much appreciated!

2 Answers 2

3

astype(bool)

Empty strings evaluate as False in a boolean context. Use filter to get at just the columns that start with feature. Then use astype(bool) and followed by any(axis=1)

df[df.filter(regex='fea').astype(bool).any(1)] foo bar name_input value_input bulb desc feature1 feature2 0 1 aa ccc rere thth my desc 1 my feature2 1 3 2 aa ccc rprp my feature1 1 6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2 9 4 bb coco rere my feature2 3 13 5 bb coco baba mpmp my desc 11 my feature1 3 

To match your results, we can back fill the desc column

feat = df.filter(regex='feat').astype(bool).any(1) desc = df.desc.where(df.desc.astype(bool)).bfill() df.assign(desc=desc)[feat] foo bar name_input value_input bulb desc feature1 feature2 0 1 aa ccc rere thth my desc 1 my feature2 1 3 2 aa ccc rprp my desc 3 my feature1 1 6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2 9 4 bb coco rere my desc 8 my feature2 3 13 5 bb coco baba mpmp my desc 11 my feature1 3 
Sign up to request clarification or add additional context in comments.

4 Comments

very slick, didn't know about strings evaluating to False, thanks for the explanation.
thank you. however "desc" column should have all values including "my desc 3" and "my desc 8" as well. How do we get that?
You need to explain why 'my desc 3' goes where it does.
for individual foo's, desc value are mandatory. And feature 1, feature 2 are dependent on bulb. If bulb is empty then either one of feature will be empty But desc will always be present with values. And first available desc of each foo is required
2

another method is to change your blank strings to true NaN values then pass the how argument to dropna and use all as the value

import numpy as np df.replace('',np.nan).dropna(subset=['feature1','feature2'],how='all').fillna('') foo bar name_input value_input bulb desc feature1 feature2 0 1 aa ccc rere thth my desc 1 my feature2 1 3 2 aa ccc rprp my feature1 1 6 3 aa ccc blbl thth my desc 5 my feature1 2 my feature2 2 9 4 bb coco rere my feature2 3 13 5 bb coco baba mpmp my desc 11 my feature1 3 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.