2

I would like to concat multiple dataframes into a single dataframe using the names of the dataframes as strings from a list. This is similar to:

df1 = pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}) df2 = pd.DataFrame({'x': [4, 5, 6], 'y': ['d', 'e', 'f']}) pd.concat([df1, df2]) 

but instead I want to provide a list of dataframe names as strings

For example,

pd.concat(['df1', 'df2']) 

Is this possible?

2
  • 2
    Variables are stored in globals namespace. So you can get them using globals()[name]. pd.concat([globals()[x] for x in ['df1', 'df2']]) but this is not idiomatic and you should store your dataframes in a local dictionary and reference from that. Commented Sep 30, 2021 at 22:15
  • @Psidom This is exactly what I was looking for thanks! Write up an answer and I'll accept. I couldn't find this on SE. Commented Sep 30, 2021 at 22:15

3 Answers 3

5

Although using globals and exec answers the question but it is considered bad practise. A better way to do this would be to use a dict likewise:

df_dict = {'df1': df1 , 'df2': df2} pd.concat(df for _, df in df_dict.items()) 
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your answer, but this is not what I am asking. I want to be able to insert a list in concat as pd.concat(['df1', 'df2'])
2

Python variable names generally have to be known at compile time, so selecting values from a list of names is tricky. As mentioned in the comments, you could use globals() to get the values from variables in global scope, but a more common practice is to use a dictionary from the beginning instead.

import pandas as pd dataframes = { "df1":pd.DataFrame({'x': [1, 2, 3], 'y': ['a', 'b', 'c']}), "df2":pd.DataFrame({'x': [4, 5, 6], 'y': ['d', 'e', 'f']}) } to_concat = ["df1", "df2"] result = pd.concat(dataframes[name] for name in to_concat) 

Now the dataframes are all tucked neatly into their own namespace instead of being mixed with other stuff in globals. This is especially useful when the dataframes are read dynamically and you'd have to figure out how to get the names into the global space in the first place.

4 Comments

Thanks for your answer. This is close, but I want to be able to insert a list in concat as pd.concat(['df1', 'df2']). Not from a dataframe.
You can't pd.concat strings. Its puzzling that you would even include dataframes in the question if you don't want to concatenate them. You said "I want to provide a list of dataframe names as strings". Dataframes don't inherently have a name - just variables or containers that happen to be holding them. The dict maps a name to a dataframe. Then I have a list of names and do the concat.
globals() is a dict, so globals()["df1"] is much the same as dataframes["df1"]. The reasons for using a dict include (1) they are dynamically created and (2) they don't have other unrelated objects in them. Suppose you want to validate a name before using it for concat, "pd" in dataframes would say False, while "pd" in globals() would say True.
I realize it is unusual, but that's why I asked if it was possible. It's for a small use case and not a larger project so I don't mind the global issue. I greatly appreciate the detail you provided in these comments. Thanks!
0

Do you want to use strings as variable names ? if so, you can do :

str_list = ["df1", "df2"] pd.concat([locals()[str_list[0]], locals()[str_list[1]]]) 

4 Comments

Avoid the globals scope. Use locals or vars...
Thanks for the tip. Just edited the snippet according to your reco.
@Corralien - if the variables are at global scope, you have to use globals. At module level, as in this example locals() is globals().
Thank you. This is what I was looking for! But I realize globals is not recommended.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.