How to concatenate three excels files xlsx using python?

Question

Hello I would like to concatenate three excels files xlsx using python.

I have tried using openpyxl, but I don't know which function could help me to append three worksheet into one.

Do you have any ideas how to do that ?

Thanks a lot

this typically does not work ... in my experience you must read in all 3 work xls files . then manually merge them (somehow) , then write out to a new xls file ... — Joran Beasley
– Joran Beasley, Commented Apr 3, 2013 at 16:54

DSM · Accepted Answer · 2016-01-25 13:53:40Z

26

Here's a pandas-based approach. (It's using openpyxl behind the scenes.)

import pandas as pd # filenames excel_names = ["xlsx1.xlsx", "xlsx2.xlsx", "xlsx3.xlsx"] # read them in excels = [pd.ExcelFile(name) for name in excel_names] # turn them into dataframes frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels] # delete the first row for all frames except the first # i.e. remove the header row -- assumes it's the first frames[1:] = [df[1:] for df in frames[1:]] # concatenate them.. combined = pd.concat(frames) # write it out combined.to_excel("c.xlsx", header=False, index=False)

edited Jan 25, 2016 at 13:53

answered Apr 3, 2013 at 18:49

DSM

355k67 gold badges606 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Auré Vat Over a year ago

thanks DSM, it looks much simplier, however panda needs numpy which can't run on Windows 64bits machine. Do you think I could try to do the same thing with openpyxl

John Y Over a year ago

@AuréVat: Numpy needs 32-bit Python, not 32-bit Windows. Lots of people run 32-bit Python on 64-bit Windows. You can have multiple Python environments on one machine.

John Y Over a year ago

@AuréVat: Also, there is at least one good unofficial 64-bit numpy build: stackoverflow.com/questions/11200137/…

BajajG Over a year ago

This code will append the first file twice - once with header, and next without header. The line frames[1:]= [df[1:] for df in frames] needs to be changed to frames_new=[df[1:] for df in frames] and then concatenate this new frame by changing combined = pd.concat(frames) to combined = pd.concat(frames_new)

Do Not Track Me Over a year ago

Just a note, i tried this with about 60 files, it hung for a long time and then finally produced a broken result. i tried reducing it to only 2 files and then got a missing module warning. after installing openpyxl, and running it again with the 60 files, it was super fast. not sure why that happened. but interesting gotcha.

|

Henry Keiter · Accepted Answer · 2013-04-03 18:26:42Z

I'd use xlrd and xlwt. Assuming you literally just need to append these files (rather than doing any real work on them), I'd do something like: Open up a file to write to with xlwt, and then for each of your other three files, loop over the data and add each row to the output file. To get you started:

import xlwt import xlrd wkbk = xlwt.Workbook() outsheet = wkbk.add_sheet('Sheet1') xlsfiles = [r'C:\foo.xlsx', r'C:\bar.xlsx', r'C:\baz.xlsx'] outrow_idx = 0 for f in xlsfiles: # This is all untested; essentially just pseudocode for concept! insheet = xlrd.open_workbook(f).sheets()[0] for row_idx in xrange(insheet.nrows): for col_idx in xrange(insheet.ncols): outsheet.write(outrow_idx, col_idx, insheet.cell_value(row_idx, col_idx)) outrow_idx += 1 wkbk.save(r'C:\combined.xls')

If your files all have a header line, you probably don't want to repeat that, so you could modify the code above to look more like this:

firstfile = True # Is this the first sheet? for f in xlsfiles: insheet = xlrd.open_workbook(f).sheets()[0] for row_idx in xrange(0 if firstfile else 1, insheet.nrows): pass # processing; etc firstfile = False # We're done with the first sheet.

I didn't think xlrd could handle .xlsx files, but openpyxl could. Am I mistaken?
@DSM It can handle .xlsx, at least I've never had trouble with it.
Well, I'll be -- looks like I'm behind the times! I only had 0.7.1 installed, and it was giving XLRDErrors, but 0.9.0 worked just fine on them. Learn something new every day!
@AuréVat See my edit, but remember that the code I gave is only a starting point.
Thanks I will play with it, and get back if I have more questions thanks a lot

user9347860 · Accepted Answer · 2018-09-23 23:32:48Z

When I combine excel files (mydata1.xlsx, mydata2.xlsx, mydata3.xlsx) for data analysis, here is what I do:

import pandas as pd import numpy as np import glob all_data = pd.DataFrame() for f in glob.glob('myfolder/mydata*.xlsx'): df = pd.read_excel(f) all_data = all_data.append(df, ignore_index=True)

Then, when I want to save it as one file:

writer = pd.ExcelWriter('mycollected_data.xlsx', engine='xlsxwriter') all_data.to_excel(writer, sheet_name='Sheet1') writer.save()

pbarill · Accepted Answer · 2018-06-19 15:11:05Z

Solution with openpyxl only (without a bunch of other dependencies).

This script should take care of merging together an arbitrary number of xlsx documents, whether they have one or multiple sheets. It will preserve the formatting.

There's a function to copy sheets in openpyxl, but it is only from/to the same file. There's also a function insert_rows somewhere, but by itself it won't insert any rows. So I'm afraid we are left to deal (tediously) with one cell at a time.

As much as I dislike using for loops and would rather use something compact and elegant like list comprehension, I don't see how to do that here as this is a side-effect show.

Credit to this answer on copying between workbooks.

#!/usr/bin/env python3 #USAGE #mergeXLSX.py <a bunch of .xlsx files> ... output.xlsx # #where output.xlsx is the unified file #This works FROM/TO the xlsx format. Libreoffice might help to convert from xls. #localc --headless --convert-to xlsx somefile.xls import sys from copy import copy from openpyxl import load_workbook,Workbook def createNewWorkbook(manyWb): for wb in manyWb: for sheetName in wb.sheetnames: o = theOne.create_sheet(sheetName) safeTitle = o.title copySheet(wb[sheetName],theOne[safeTitle]) def copySheet(sourceSheet,newSheet): for row in sourceSheet.rows: for cell in row: newCell = newSheet.cell(row=cell.row, column=cell.col_idx, value= cell.value) if cell.has_style: newCell.font = copy(cell.font) newCell.border = copy(cell.border) newCell.fill = copy(cell.fill) newCell.number_format = copy(cell.number_format) newCell.protection = copy(cell.protection) newCell.alignment = copy(cell.alignment) filesInput = sys.argv[1:] theOneFile = filesInput.pop(-1) myfriends = [ load_workbook(f) for f in filesInput ] #try this if you are bored #myfriends = [ openpyxl.load_workbook(f) for k in range(200) for f in filesInput ] theOne = Workbook() del theOne['Sheet'] #We want our new book to be empty. Thanks. createNewWorkbook(myfriends) theOne.save(theOneFile)

Tested with openpyxl 2.5.4, python 3.4.

Dhruv Kadia · Accepted Answer · 2019-05-07 21:39:19Z

You can simply use pandas and os library to do this.

import pandas as pd import os #create an empty dataframe which will have all the combined data mergedData = pd.DataFrame() for files in os.listdir(): #make sure you are only reading excel files if files.endswith('.xlsx'): data = pd.read_excel(files, index_col=None) mergedData = mergedData.append(data) #move the files to other folder so that it does not process multiple times os.rename(files, 'path to some other folder')

mergedData DF will have all the combined data which you can export in a separate excel or csv file. Same code will work with csv files as well. just replace it in the IF condition

Scott Weaver · Accepted Answer · 2019-05-14 15:33:54Z

Just to add to p_barill's answer, if you have custom column widths that you need to copy, you can add the following to the bottom of copySheet:

 for col in sourceSheet.column_dimensions: newSheet.column_dimensions[col] = sourceSheet.column_dimensions[col]

I would just post this in a comment on his or her answer but my reputation isn't high enough.

Collectives™ on Stack Overflow

How to concatenate three excels files xlsx using python?

6 Answers 6

11 Comments

10 Comments

Comments

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

11 Comments

10 Comments

Comments

1 Comment

Comments

Comments

Linked

Related