3

I am attempting to name multiple dataframes using a variable in a for loop. Here is what I tried:

for name in DF['names'].unique(): df_name = name + '_df' df_name = DF.loc[DF['names'] == str(name) 

If one of the names in the DF['names'] column is 'George', the below command should work to print out the beginning of of of the dataframes that was generated.

George_df.head() 

But I get an error message:

TypeError: unsupported operand type(s) for +: 'int' and 'str' 

Previous questions discuss ways to do this in a dictionary, but I am looking for a way to implement this for a dataframe.

3
  • 2
    Does this answer your question? How do I create a variable number of variables?. ... How can you dynamically create variables via a while loop? Commented May 11, 2020 at 18:48
  • Probably the most common solution is to keep the objects in a dictionary. Commented May 11, 2020 at 18:52
  • When posting a question about code that produces an Exception, always include the complete Traceback - copy and paste it then format it as code (select it and type ctrl-k) Commented May 11, 2020 at 18:54

2 Answers 2

4

SetUp

df=pd.DataFrame({'names' : ['a','a','b','b'], 'values':list('1234')}) print(df) names values 0 a 1 1 a 2 2 b 3 3 b 4 

Using globals and DataFrame.groupby

for name, group in df.groupby('names'): globals()[f'df_{name}'] = group print(df_a) names values 0 a 1 1 a 2 print(df_b) names values 2 b 3 3 b 4 

Although using globals is not recommended, I suggest you use a dictionary

dfs = dict(df.groupby('names').__iter__()) print(dfs['a']) names values 0 a 1 1 a 2 
Sign up to request clarification or add additional context in comments.

8 Comments

Minor semantics - the phrasing sounds like you're recommending globals over using a dict. I'm quite sure that's not the case since you don't sound insane to me.
@ansev thank you for this explanation! Can I use the same command perform an operation within each of the dictionaries? Like: df_{name}['Num'] = np.arange(1, 521)
@r.ook Why is using globals insane? Can it damage something?
Using globals scope usually is frown upon as it messies up your namespace and makes it harder to work with the more complex your code is. If you can get away with just having one dict to manage all your variable names instead of say 100 names in your global scope, it makes life much easier. That, and you might unwittingly overwrite some existing names.
I recommend use a dictionary and I discourage the use of globals, I think I explained it wrong in my answer :) @r.ook
|
0

I would recommend going with a dictionary structure like so:

test_dict = {} test_dict["George"] = pd.DataFrame({"A":[1,2,3,4,5]}) 

In your case:

test_dict = {} for name in DF['names'].unique(): df_name = name + '_df' test_dict[df_name] = DF.loc[DF['names'] == str(name)] 

But if you need to set new variables, this post will explain how to create them.

for name in DF['names'].unique(): df_name = name + '_df' globals()[df_name] = DF.loc[DF['names'] == str(name)] 

2 Comments

We shouldn't use Series.unique + boolean indexing, this is slow. We shoud use groupby here
@ansev I agree. I was trying to align it similar to his code, but yours is definitely more efficient.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.