How can I create a multiple new dataframes inside a for loop?

Question

I want to create a for_loop that doesn't overwrite the exiting dataframe?

for df in 2011, 2012, 2013: df = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

Right now the for loop above iterates over each of the existing dataframes. How can I make it so the for loop creates a bunch of new dataframes?

2011_pivot, 2012_pivot, 2013_pivot

So final output would be three dataframes or one dataframe with all the previous dataframes concatenated? — mad_
– mad_, Commented Oct 1, 2018 at 20:04
You should use a dict to save the dataframes you are creating, where "2011_pivot", "2012_pivot" and "2013_pivot" are the keys. — brunormoreira
– brunormoreira, Commented Oct 1, 2018 at 20:09
stackoverflow.com/a/52457013/10292170 stackoverflow.com/a/52508030/10292170 — ipramusinto
– ipramusinto, Commented Oct 1, 2018 at 20:29
Did an answer below help? If so, feel free to accept one, or ask for clarification. — jpp
– jpp, Commented Oct 3, 2018 at 22:50

Sven Harris · Accepted Answer · 2018-10-01 20:10:14Z

I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'

df_dict = dict() for df in 2011, 2012, 2013: df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count')

I'm assuming here that your dataframes have the names "2011", "2012", "2013"

Colonder · Accepted Answer · 2018-10-01 20:07:48Z

I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.

df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013]

You can find an example here.

jpp · Accepted Answer · 2018-10-01 20:17:55Z

Don't create variables needlessly. Use a dict or list instead, e.g. via a dictionary or list comprehension.

Alternatively, consider MultiIndex columns and a single pd.pivot_table call:

dfs = {2011: df_2011, 2012: df_2012, 2013: df_2013} comb = pd.concat([v.assign(year=k) for k, v in dfs.items()], ignore_index=True) df = pd.pivot_table(comb, index='income', columns=['year', 'area'], values='id', aggfunc='count')

Then you can use regular indexing methods to filter for a particular year, e.g.

pivot_2011 = df.iloc[:, df.columns.get_level_values(0).eq(2011)]

Collectives™ on Stack Overflow

How can I create a multiple new dataframes inside a for loop?

3 Answers 3

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Linked

Related