1

I have an excel file to analyze but have a lot of data that I don't want to analyze, can we delete a column if we don't find the value SpaceX string in the first row like following

SL# State District 10/01/2021 10/01/2021 10/01/2021 11/01/2021 11/01/2021 11/01/2021 SpaceX in Star in StarX out SpaceX out Star out StarX in 1 wb al 10 11 12 13 14 15 2 wb not 23 22 20 24 25 25 

Now here I want to delete the columns where in the rows SpaceX not there. And then Want to delete the SpaceX as well to shift up the rows ultimate output will look like as follows

SL# State District 10/01/2021 11/01/2021 1 wb al 10 13 2 wb not 23 24 

Tried with loc and iloc functions but no clue at the moment.

Also checked the answer: Drop columns if rows contain a specific value in Pandas but it's different. I'm checking the substring not the exact value match.

1 Answer 1

2

Firstly create a boolean mask with startswith() method and fillna() method:

mask=df.loc[0].str.startswith('SpaceX').fillna(True) 

Finally use Transpose(T) attribute,loc accessor and drop() method:

df=df.T.loc[mask].T.drop(0) 

Output of df:

 SL# State District 2021-01-10 00:00:00 2021-01-11 00:00:00 2021-01-12 00:00:00 1 1.0 wb al 10 13 16 2 2.0 wb not 23 13 16 
Sign up to request clarification or add additional context in comments.

8 Comments

Nope, its not coming like this. Sample file added in github. github.com/DiceInstitute/data-analysis/blob/main/sample.xlsx
Updated answer kindly have a look :)
Updated answer...kindly have a look :)
Since the column names are same for that dates it's giving a positional value. Tried with csv read as well, coming as 19/05/2021.3 20/05/2021.3 21/05/2021.3 22/05/2021.3 Can we clean it? Like truncate ?
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.