Python pandas read file, write to excel

Question

I have a file like this :

SOME_INFO_BEGIN .... .... SOME_INFO_END ACTUAL_DETAIL_BEGIN TEST|1|23|abcd| TEST|2|5|efgs| TEST|3|124|zyz| ACTUAL_DETAIL_END

I only to read the lines between ACTUAL_DETAILS_BEGIN and ACTUAL_DETAILS_END and they will always start with TEST, however i also only to read the line which has 5 in the 3rd column.

The below code works for me except it gets all the 3 lines -

with open(dir+filename, 'r') as filehandle: filecontent = filehandle.readlines() ifa = [k for k in filecontent if 'TEST' in k] df = pd.DataFrame([sub.split("|") for sub in ifa]) df.columns= ['Type','Amt','Desc','Value1','Value2'] df1 = df[['Type','Desc']] print(df1) df1.to_excel (dir+"test.xlsx", index = False)

Q1. Is there a better way to code this ? For eg. how is the filehandle closed for excel write?

Q2. How do i only pick up the 2nd row ?

Nevermind. You could use the csv module to open the file and parse the lines lazily. readlines() is putting the whole thing in memory before you even start filtering — roganjosh
– roganjosh, Commented Jun 22, 2019 at 12:58

roganjosh · Accepted Answer · 2019-06-22 13:09:31Z

Nothing to test with but you can iterate files and lazily load the lines. Perhaps this is more efficient:

rebuilt = [] with open(dir+filename, 'r') as infile: for row in infile: if row[:4] == 'TEST': rebuild.append(row.split('|')) df = pd.DataFrame(rebuilt, columns= ['Type','Amt','Desc','Value1','Value2'])

readlines() is going to load the whole thing into memory, regardless, so you can filter the lines as you read them instead. You're also splitting each line before checking a condition, so it may be faster to check against a list slice.

Collectives™ on Stack Overflow

Python pandas read file, write to excel

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related