2

I want to create a for_loop that doesn't overwrite the exiting dataframe?

for df in 2011, 2012, 2013: df = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') 

Right now the for loop above iterates over each of the existing dataframes. How can I make it so the for loop creates a bunch of new dataframes?

2011_pivot, 2012_pivot, 2013_pivot 
4
  • 1
    So final output would be three dataframes or one dataframe with all the previous dataframes concatenated? Commented Oct 1, 2018 at 20:04
  • You should use a dict to save the dataframes you are creating, where "2011_pivot", "2012_pivot" and "2013_pivot" are the keys. Commented Oct 1, 2018 at 20:09
  • stackoverflow.com/a/52457013/10292170 stackoverflow.com/a/52508030/10292170 Commented Oct 1, 2018 at 20:29
  • Did an answer below help? If so, feel free to accept one, or ask for clarification. Commented Oct 3, 2018 at 22:50

3 Answers 3

3

I would generally discourage you from creating lots of variables with related names which is a dangerous design pattern in Python (although it's common in SAS for example). A better option would be to create a dictionary of dataframes with the key as your 'variable name'

df_dict = dict() for df in 2011, 2012, 2013: df_dict["pivot_"+df.name] = pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') 

I'm assuming here that your dataframes have the names "2011", "2012", "2013"

Sign up to request clarification or add additional context in comments.

Comments

1

I don't see any other way but to create a list or a dictionary of data frames, you'd have to name them manually otherwise.

df_list = [pd.pivot_table(df, index=["income"], columns=["area"], values=["id"], aggfunc='count') for df in 2011, 2012, 2013] 

You can find an example here.

Comments

0

Don't create variables needlessly. Use a dict or list instead, e.g. via a dictionary or list comprehension.

Alternatively, consider MultiIndex columns and a single pd.pivot_table call:

dfs = {2011: df_2011, 2012: df_2012, 2013: df_2013} comb = pd.concat([v.assign(year=k) for k, v in dfs.items()], ignore_index=True) df = pd.pivot_table(comb, index='income', columns=['year', 'area'], values='id', aggfunc='count') 

Then you can use regular indexing methods to filter for a particular year, e.g.

pivot_2011 = df.iloc[:, df.columns.get_level_values(0).eq(2011)] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.