0

I have over 1500 python dataframes that I need to combine to one large dataframe. The issue that I have is that dataframes have unique column headers and different sizes.

For example dataframe 1 is:

type sc98*c.firstname sc98*c.lastname sc98*c.username text createdAt statusofExpiration need John Doe johndoe I need a new car. 111111 expired 

And dataframe 2 is:

type l8!7s4fn.firstname l8!7s4fn.lastname l8!7s4fn.username text tags.0 tags.1 image.0 createdAt statusOfExpiration need Matt Smith mattsmith I need a yoga trainer. yoga trainer blankurl.com/ 22222 fulfilled 

And I want to end up with a data frame like:

type firstname lastname username text createdAt statusofExpiration tags.0 tags.1 image.0 need John Doe johndoe I need a new car. 111111 expired need Matt Smith mattsmith I need a yoga trainer. 222222 fulfilled yoga trainer blankurl.com/ 

As I mentioned, I won't be able to call the values by indices because of the variable dataframes sizes and I can't call the values by the column name because dataframes have a unique identifiers (e.g. id.username) in the column headers.

Is there anyway to get around this problem?

11
  • 4
    Don't post dataframes as images. Commented Aug 8, 2018 at 19:18
  • 1
    Possible duplicate of Merge multiple dataframe pandas Commented Aug 8, 2018 at 19:18
  • 1
    It's a bit more than (stackoverflow.com/questions/51115262/…) as the names of the column are not exactly the same, look at the one with firstname. Need a rename on column first Commented Aug 8, 2018 at 19:24
  • 1
    @Ben.T I can't view the images unfortunately so without that further information it looked like a dupe to me :/ I think the question would be made a lot better with the images either embedded or formatted into the text body Commented Aug 8, 2018 at 19:28
  • 1
    @Turtle Maybe try to use the code formatting (symbol is { } when you write a question it's just above the text area) to make the lines of your data looks better. Commented Aug 8, 2018 at 19:36

2 Answers 2

0

Since the data frames have unique column headers and different sizes there is not a simple way to concatenate the data frames. I would reccomend looking into the following:

df.filter(like='firstname') # select columns containing the word firstname 

This way you can loop through the column names in all of the data frames and rename them based on partial matches.

Look into this post: Pandas rename colums with wildcard

Sign up to request clarification or add additional context in comments.

Comments

0

You can do this to concatenate or merge multiple data frames. Hope this help !

df1 = DataFrame( { 'First Name': firstname_list, 'Last Name': lastname_list, } ) df2 = DataFrame( { 'Key1': value_list1, 'Key2': value_list2, } ) frames = [df1, df2] concatenated_df = pd.concat(frames) concatenated_df.to_csv(r'dataset.csv', sep=',', index=False) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.