1

I want to loop through a directory and find specific xlsx files and then put them each into separate pandas dataframe. The thing here is that I also want all sheets in those excel files to be in the dataframe.

Below is a sample of code that I implemented, I just need to add the logic to pick all sheets:

import pandas as pd from glob import glob path = 'path_to_file' files = glob(path + '/*file*.xlsx') get_df = lambda f: pd.read_excel(f) dodf = {f: get_df(f) for f in files} dodf[files[2]] --- dictionary of dataframes 
1
  • I have already created a dictionary to access each dataframe. So the current method should be fine, i.e. in a dictionary. Commented Sep 13, 2017 at 19:44

1 Answer 1

1

As described in this answer in Pandas you still have access to the ExcelFile class, which loads the file creating an object.

This object has a .sheet_names property which gives you a list of sheet names in the current file.

xl = pd.ExcelFile('foo.xls') xl.sheet_names # list of all sheet names 

To actually handle the import of the specific sheet, use .parse(sheet_name) on the object of the imported Excel file:

xl.parse(sheet_name) # read a specific sheet to DataFrame 

For your code something like:

get_df = lambda f: pd.ExcelFile(f) dodf = {f: get_df(f) for f in files} 

...gives you dodf a dictionary of ExcelFile objects.

filename = 'yourfilehere.xlsx' a_valid_sheet = dodf[filename].sheet_names[0] # First sheet df = dodf[filename].parse(sheet_name) 
Sign up to request clarification or add additional context in comments.

4 Comments

I do not want to manually input the filename. Is there a way to get it from the dictionary dodf that I have created? I am completely new to Python so I do not know as such how it all works.
yep — but you need you change your pd.read_excel(f) to pd.ExcelFile(f). Once that is done, each object will have the .sheet_names attribute which is a list of sheets in that file.
Yes, I already did. But then, I have to individually parse each of the sheet into the dataframe right?
@ManasJani that's right. But you can iterate over the list of sheet_names to do this, e.g. for sheet in your_xls_obj.sheetnames: df = your_xls_obj.parse(sheet)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.