Importing data from multiple files using python

Question

I have to make an application in which I have to import all the excel files in the given folder and add it to a dataframe. The dataframe should look as shown:

Expected Data Frame

As seen in the image one of the columns for the dataframe is the name of the file.

I have successfully added that column in the final dataframe and the code is as follows:

 import pandas as pd import os import shutil import re path = 'C:/Users/Administrator/Desktop/Zerodha/Day2' lst = os.listdir(path) files = [os.path.join(path,x) for x in lst] print(lst) dataframes_lst = [] for file in files: filename = file.split('/')[-1] dataframe = pd.read_csv(file, usecols=[0,4], names ["date",filename],index_col=["date"]) dataframes_lst.append(dataframe) df = pd.concat(dataframes_lst, axis=1) print(df) df.to_csv('data.csv')

The dataframe which is obtained using this code is as displayed:

For reference I will attach the snippet of one of the excel files:

Excel snippet

Also as seen there are many nan values obtained. I tried to remove them by using pd.dropna(inplace = True) function and also by doing as suggested in this post:

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

But the resultant dataframe still contains the nan values.

I have solved the nan issue by using the fillna function

Huzefa Sadikot
– Huzefa Sadikot

2020-11-19 09:22:29 +00:00
Commented Nov 19, 2020 at 9:22 — Huzefa Sadikot
– Huzefa Sadikot, Commented Nov 19, 2020 at 9:22

Orkun Berk Yuzbasioglu · Accepted Answer · 2020-11-18 10:14:09Z

Regarding

My doubt is that how do I loop through all the files in the directory and extract data of each file in the required format

You can loop through all the files in the directory and extract the data and filename as the header of the dataframe as such:

import pandas as pd import os path = './data' lst = os.listdir('./data/') files = [os.path.join(path,el) for el in lst]

and the structure of example.xlsx is:

dataframes_lst = [] for file in files: filename = file.split('/')[-1] dataframe = pd.read_excel(file, usecols=[3], names=[filename]) dataframes_lst.append(dataframe) df = pd.concat(dataframes_lst, axis=1) print(df)

Here, the dataframes are concatenated along the axis=1 and the output of print(df)is

Thanks for your time. This is the solution I was looking for,

skarit · Accepted Answer · 2020-11-18 09:20:57Z

Try this:

import pandas as pd from pathlib import Path read_path = Path('C:/Users/Administrator/Desktop/Zerodha/Day2') df = pd.concat([pd.read_csv(path) for path in read_path.glob('*.csv')])

If you want to read from excel just use read_excel and change the pattern to '*.xlsx'

I am getting the data but not as expected in the Expected Data Frame. Your code gives me the data in the format mentioned in the Excel snippet. I need it in the format as mentioned in the Expected Data Frame

Collectives™ on Stack Overflow

Importing data from multiple files using python

2 Answers 2

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Linked

Related