How to name dataframes in a for-loop? [duplicate]

Question

I am attempting to name multiple dataframes using a variable in a for loop. Here is what I tried:

for name in DF['names'].unique(): df_name = name + '_df' df_name = DF.loc[DF['names'] == str(name)

If one of the names in the DF['names'] column is 'George', the below command should work to print out the beginning of of of the dataframes that was generated.

George_df.head()

But I get an error message:

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Previous questions discuss ways to do this in a dictionary, but I am looking for a way to implement this for a dataframe.

Does this answer your question? How do I create a variable number of variables?. ... How can you dynamically create variables via a while loop? — wwii
– wwii, Commented May 11, 2020 at 18:48
Probably the most common solution is to keep the objects in a dictionary. — wwii
– wwii, Commented May 11, 2020 at 18:52
When posting a question about code that produces an Exception, always include the complete Traceback - copy and paste it then format it as code (select it and type ctrl-k) — wwii
– wwii, Commented May 11, 2020 at 18:54

ansev · Accepted Answer · 2020-05-11 19:06:02Z

4

SetUp

df=pd.DataFrame({'names' : ['a','a','b','b'], 'values':list('1234')}) print(df) names values 0 a 1 1 a 2 2 b 3 3 b 4

Using globals and DataFrame.groupby

for name, group in df.groupby('names'): globals()[f'df_{name}'] = group print(df_a) names values 0 a 1 1 a 2 print(df_b) names values 2 b 3 3 b 4

Although using globals is not recommended, I suggest you use a dictionary

dfs = dict(df.groupby('names').__iter__()) print(dfs['a']) names values 0 a 1 1 a 2

edited May 11, 2020 at 19:06

answered May 11, 2020 at 18:52

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

r.ook Over a year ago

Minor semantics - the phrasing sounds like you're recommending globals over using a dict. I'm quite sure that's not the case since you don't sound insane to me.

arkadiy Over a year ago

@ansev thank you for this explanation! Can I use the same command perform an operation within each of the dictionaries? Like: df_{name}['Num'] = np.arange(1, 521)

arkadiy Over a year ago

@r.ook Why is using globals insane? Can it damage something?

r.ook Over a year ago

Using globals scope usually is frown upon as it messies up your namespace and makes it harder to work with the more complex your code is. If you can get away with just having one dict to manage all your variable names instead of say 100 names in your global scope, it makes life much easier. That, and you might unwittingly overwrite some existing names.

ansev Over a year ago

I recommend use a dictionary and I discourage the use of globals, I think I explained it wrong in my answer :) @r.ook

|

webb · Accepted Answer · 2020-05-11 18:53:00Z

I would recommend going with a dictionary structure like so:

test_dict = {} test_dict["George"] = pd.DataFrame({"A":[1,2,3,4,5]})

In your case:

test_dict = {} for name in DF['names'].unique(): df_name = name + '_df' test_dict[df_name] = DF.loc[DF['names'] == str(name)]

But if you need to set new variables, this post will explain how to create them.

for name in DF['names'].unique(): df_name = name + '_df' globals()[df_name] = DF.loc[DF['names'] == str(name)]

We shouldn't use Series.unique + boolean indexing, this is slow. We shoud use groupby here
@ansev I agree. I was trying to align it similar to his code, but yours is definitely more efficient.

Collectives™ on Stack Overflow

How to name dataframes in a for-loop? [duplicate]

2 Answers 2

8 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

2 Comments

Linked

Related