Creating multiple dataframes with a loop

Question

This undoubtedly reflects lack of knowledge on my part, but I can't find anything online to help. I am very new to programming. I want to load 6 csvs and do a few things to them before combining them later. The following code iterates over each file but only creates one dataframe, called df.

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv') dfs = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6') for df, file in zip(dfs, files): df = pd.read_csv(file) print(df.shape) print(df.dtypes) print(list(df))

I think you create 6 dataframes, but keep only the last one. Correct? df gets overwritten in each iteration. BTW. are there a couple of quotes missing on the second line? — Gerard H. Pille
– Gerard H. Pille, Commented Feb 20, 2018 at 14:55
Quotes missing on the first line too. Your code doesn't even pass a syntax check, let alone that it would create a single dataframe. — Gerard H. Pille
– Gerard H. Pille, Commented Feb 20, 2018 at 15:00
Thanks. The quotes are there in my local code. I was just sloppy in abbreviating it for the post. — Bob Isahofferer
– Bob Isahofferer, Commented Feb 20, 2018 at 15:04
You can store them in a list and then later concatenate the dataframes i.e. Before for loop, ls = [], last line in for loop ls.append(df), and then after for loop pd.concat(ls) — DJK
– DJK, Commented Feb 20, 2018 at 15:12

Keith Dowd · Accepted Answer · 2018-02-20 15:24:16Z

I think you think your code is doing something that it is not actually doing.

Specifically, this line: df = pd.read_csv(file)

You might think that in each iteration through the for loop this line is being executed and modified with df being replaced with a string in dfs and file being replaced with a filename in files. While the latter is true, the former is not.

Each iteration through the for loop is reading a csv file and storing it in the variable df effectively overwriting the csv file that was read in during the previous for loop. In other words, df in your for loop is not being replaced with the variable names you defined in dfs.

The key takeaway here is that strings (e.g., 'df1', 'df2', etc.) cannot be substituted and used as variable names when executing code.

One way to achieve the result you want is store each csv file read by pd.read_csv() in a dictionary, where the key is name of the dataframe (e.g., 'df1', 'df2', etc.) and value is the dataframe returned by pd.read_csv().

list_of_dfs = {} for df, file in zip(dfs, files): list_of_dfs[df] = pd.read_csv(file) print(list_of_dfs[df].shape) print(list_of_dfs[df].dtypes) print(list(list_of_dfs[df]))

You can then reference each of your dataframes like this:

print(list_of_dfs['df1']) print(list_of_dfs['df2'])

You can learn more about dictionaries here:

https://docs.python.org/3.6/tutorial/datastructures.html#dictionaries

I thought I read you can use strings as variable names, but you shouldn't do it.
You are right. You can use them as variable names but the manner in which you do so makes your code harder to understand. Specifically, you could write: eval(s + "=1") and, if s is a string that conforms to Python's variable naming conventions, this code would assign 1 to the variable with the name stored in s. There's a good post about why you shouldn't use eval here link stackoverflow.com/questions/1933451/….
Is it possible to reference each of the dataframes by using df1 instead of list_of_dfs['df1'] ?
@KeithDowd This works quite good, but I was asking myself how to name the dataframes df1, df2,... instead of list_of_dfs['df1']. Any idea?

ilia timofeev · Accepted Answer · 2018-02-20 16:45:26Z

Use dictionary to store you DataFrames and access them by name

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv') dfs_names = ('df1', 'df2', 'df3', 'df4', 'df5', 'df6') dfs ={} for dfn,file in zip(dfs_names, files): dfs[dfn] = pd.read_csv(file) print(dfs[dfn].shape) print(dfs[dfn].dtypes) print(dfs['df3'])

Use list to store you DataFrames and access them by index

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv') dfs = [] for file in files: dfs.append( pd.read_csv(file)) print(dfs[len(dfs)-1].shape) print(dfs[len(dfs)-1].dtypes) print (dfs[2])

Do not store intermediate DataFrame, just process them and add to resulting DataFrame.

files = ('data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv') df = pd.DataFrame() for file in files: df_n = pd.read_csv(file) print(df_n.shape) print(df_n.dtypes) # do you want to do df = df.append(df_n) print (df)

If you will process them differently, then you do not need a general structure to store them. Do it simply independent.

df = pd.DataFrame() def do_general_stuff(d): #here we do common things with DataFrame print(d.shape,d.dtypes) df1 = pd.read_csv("data1.csv") # do you want to with df1 do_general_stuff(df1) df = df.append(df1) del df1 df2 = pd.read_csv("data2.csv") # do you want to with df2 do_general_stuff(df2) df = df.append(df2) del df2 df3 = pd.read_csv("data3.csv") # do you want to with df3 do_general_stuff(df3) df = df.append(df3) del df3 # ... and so on

And one geeky way, but don't ask how it works:)

from collections import namedtuple files = ['data1.csv', 'data2.csv', 'data3.csv', 'data4.csv', 'data5.csv', 'data6.csv'] df = namedtuple('Cdfs', ['df1', 'df2', 'df3', 'df4', 'df5', 'df6'] )(*[pd.read_csv(file) for file in files]) for df_n in df._fields: print(getattr(df, df_n).shape,getattr(df, df_n).dtypes) print(df.df3)

Gerard H. Pille · Accepted Answer · 2018-02-20 15:22:56Z

A dictionary can store them too

import pandas as pd from pprint import pprint files = ('doms_stats201610051.csv', 'doms_stats201610052.csv') dfsdic = {} dfs = ('df1', 'df2') for df, file in zip(dfs, files): dfsdic[df] = pd.read_csv(file) print(dfsdic[df].shape) print(dfsdic[df].dtypes) print(list(dfsdic[df])) print(dfsdic['df1'].shape) print(dfsdic['df2'].shape)

Collectives™ on Stack Overflow

Creating multiple dataframes with a loop

3 Answers 3

5 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

Comments

Comments

Linked

Related