how to concatenate multiple excel sheets from the same file?

Question

I have a big excel file that contains many different sheets. All the sheets have the same structure like:

Name col1 col2 col3 col4 1 1 2 4 4 3 2 1

How can I concatenate (vertically) all these sheets in Pandas without having to name each of them manually? If these were files, I could use glob to obtain a list of files in a directory. But here, for excel sheets, I am lost.
Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

Thanks!

MaxU - stand with Ukraine · Accepted Answer · 2020-11-28 13:47:50Z

Try this:

dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)

this will return you a dictionary of DFs, which you can easily concatenate using pd.concat(dfs) or as @jezrael has already posted in his answer:

df = pd.concat(pd.read_excel(filename, sheet_name=None, skiprows=1))

sheet_name: None -> All sheets as a dictionary of DataFrames

UPDATE:

Is there a way to create a variable in the resulting dataframe that identifies the sheet name from which the data comes from?

dfs = pd.read_excel(filename, sheet_name=None, skiprows=1)

assuming we've got the following dict:

In [76]: dfs Out[76]: {'d1': col1 col2 col3 col4 0 1 1 2 4 1 4 3 2 1, 'd2': col1 col2 col3 col4 0 3 3 4 6 1 6 5 4 3}

Now we can add a new column:

In [77]: pd.concat([df.assign(name=n) for n,df in dfs.items()]) Out[77]: col1 col2 col3 col4 name 0 1 1 2 4 d1 1 4 3 2 1 d1 0 3 3 4 6 d2 1 6 5 4 3 d2

nice but how can I get rid of the Name super column? I thought I could use some read_excel (skip = 1) somewhere with the dictionary?
@blacksite, yes, thank you. I thought OP knows it already... ;-)
@ℕʘʘḆḽḘ, i've updated my answer - is that what you want?

boot-scootin · Accepted Answer · 2017-10-06 13:23:02Z

Taking a note from this question:

import pandas as pd file = pd.ExcelFile('file.xlsx') names = file.sheet_names # see all sheet names df = pd.concat([file.parse(name) for name in names])

Results:

df Out[6]: A B 0 1 3 1 2 4 0 5 6 1 7 8

Then you can run df.reset_index(), to, well, reset the index.

Edit: pandas.ExcelFile.parse is, according to the pandas docs:

Equivalent to read_excel(ExcelFile, ...) See the read_excel docstring for more info on accepted parameters

thanks but why using the file.parse instead of some read.excel?
great. thanks guys but I had to give this one to the fastest one! :)

jezrael · Accepted Answer · 2017-10-06 12:49:03Z

First add parameter sheetname=None for dict of DataFrames and skiprows=1 for omit first row and then use concat for MultiIndex DataFrame.

Last use reset_index for column from first level:

df = pd.concat(pd.read_excel('multiple_sheets.xlsx', sheetname=None, skiprows=1)) df = df.reset_index(level=1, drop=True).rename_axis('filenames').reset_index()

thanks jezrael. same issue, how can I get rid of the first line for every sheet here?
great. thanks guys but I had to give this one to the fastest one! :)

malathivenkatesan · Accepted Answer · 2019-04-26 21:19:55Z

file_save_location='myfolder' file_name='filename' location = ''myfolder1' os.chdir(location) files_xls = glob.glob("*.xls*") excel_names=[f for f in files_xls] sheets = pd.ExcelFile(files_xls[0]).sheet_names def combine_excel_to_dfs(excel_names, sheet_name): sheet_frames = [pd.read_excel(x, sheet_name=sheet_name) for x in excel_names] combined_df = pd.concat(sheet_frames).reset_index(drop=True) return combined_df i = 0 while i < len(sheets): process = sheets[i] consolidated_file= combine_excel_to_dfs(excel_names, process) consolidated_file.to_csv(file_save_location+file_name+'.csv') i = i+1 else: "we done on consolidation part"

Would be better if you could edit the post with some description followed by the code.

Collectives™ on Stack Overflow

how to concatenate multiple excel sheets from the same file?

4 Answers 4

9 Comments

3 Comments

3 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

9 Comments

3 Comments

3 Comments

1 Comment

Linked

Related