1

I have data like this format:

Date,Open,High,Low,Close,Adj Close,Volume 2019-07-31,0.44,0.4401,0.44,0.44,0.44,32900 2019-08-01,0.45,0.45,0.45,0.45,0.45,200 2019-08-02,0.44,0.44,0.43,0.44,0.44,13800 2019-08-08,0.45,0.4501,0.45,0.4501,0.4501,400 2019-08-15,0.43,0.43,0.43,0.43,0.43,300 2019-08-15,0.0,0.0,0.0,0.43,0.43,0 

Notice the last row has empty data.
How can i filter this row or delete this row?

df = None for ticker in tickers: try: df = pd.read_csv('stock_data/daily/{}.csv'.format(ticker), parse_dates=True, index_col=0).dropna() except FileNotFoundError as e: continue # continue with next ticker df_closes = df['Close'] if len(df_closes) < 4: continue # continue with next ticker df_closes = df_closes[pd.notnull(df['Close'])] # delete rows with empty data df_closes = df_closes.reindex(index=df_closes.index[::-1]) # reversing 
4

2 Answers 2

0

I assume that you consider a record to be empty given the Volume column is zero, we can filter these out with:

df = df[df['Volume'] > 0] 

Or we can check if at least one of Open, High, Low is different from zero with, and filter out rows that have only zeros in these columns:

df = df[df[['Open', 'High', 'Low']].any(axis=1)] 

You can remove duplicate indices with:

df = df[~df.index.duplicated()]

We can add a parameter to duplicated(..) which specifies which one to retain. Possible values are 'first', 'last', and False (this means all items of a duplicated index are removed). The default is 'first'.

You can make a combination of first removing records with no values, and then for example remove items with duplicated indices. I would not perform this in the opposite order, since then you risk removing the record with data, and retain the one without data.

Sign up to request clarification or add additional context in comments.

Comments

0

Delete the rows with the codes as below:

#Deleting the whole row if a specific column(multiple rows can be added into this list) has value zero df[df['High'] != 0] df[df['High'].ne(0)] 

or

#If values in any of the column in a row has zero df[(df != 0).all(1)] df[~(df == 0).any(axis=1)] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.