I am having trouble with reading and writing moderately sized excel files in Pandas. I have 5 files each around 300 MB large. I need to combine these files into one, do some processing and then save it (as excel preferably):
import pandas as pd f1 = pd.read_excel('File_1.xlsx') f2 = pd.read_excel('File_2.xlsx') f3 = pd.read_excel('File_3.xlsx') f4 = pd.read_excel('File_4.xlsx') f5 = pd.read_excel('File_5.xlsx') FULL = pd.concat([f1,f2,f3,f4,f5], axis=0, ignore_index=True, sort=False) FULL.to_excel('filename.xlsx', index=False)' But unfortunately read takes way too much time (around 15 minutes or so), and write used up 100% of memory (on my 16 GB ram PC), and was taking so much time that I was forced to interrupt the program. Is there any way I could accelerate both read/write?
xlsxis a ZIP package containing Excel files so the actual data size can be a lot bigger. Then you create a contatenated frame, that's another 1.5GB (or 3GB) in RAM. Then you try to export this in one go, which means generating the XML content in memory before saving it to a new ZIP packagedelkeyword to delete variables before attemptingto_excel()but memory% remained the same