0

I have 2 data frames, and i'd like to get the first data frame that contains data from the second data frame, based on the their index. The catch is that I do it iteratively and the columns index numbers of only the first df increase by one with each iteration, so it causes error.

example to that would be: First df after first iteration:

 0 440 7.691 

Second df after first iteration (doesn't change after each iteration):

 1 0 M 1 M 2 M 3 M 4 M .. .. 440 B 441 M 442 M 

when i ran the code, I get the wanted df:

df_with_label = first_df.join(self.second_df) 0 1 440 7.691 B 

After second iteration, my first df in now:

 1 3 10.72 

and when i run the same df_with_label = first_df.join(self.second_df) i'd like to get:

 1 2 3 10.72 M 

But I get the error:

ValueError: columns overlap but no suffix specified: Int64Index([1], dtype='int64') 

I'm guessing it has a problem with the fact that the index of the column of the first df is 1 after the second iteration, but don't know how to fix it. i'd like to keep the index of the first column to keep increasing.

The best solution would be to give the second column different name, so like:

 1 class 3 10.72 M 

Any idea how to fix it?

3
  • can you show a example of your df and your expect output? Commented Sep 3, 2019 at 9:35
  • I gave an example of both df for 2 iterations, is it not clear enough? Commented Sep 3, 2019 at 9:37
  • surely it is not necessary to iterate. Simply so that it is understood better you should create input dataframe and its corresponding output that you expect to obtain Commented Sep 3, 2019 at 9:40

2 Answers 2

1

If I got it right your second dataframe doesn't change with iterations so why don't you just change its column name once and for all:

second_df.columns=['colname'] 

this should solve your naming conflicts.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. That's a possibility that worked. Is there a way to reset the column index of the first df at each iteration if I'd like to go with that direction?
I'm not sure how your cycle works but you should be able to update the column name of your dataframe at each iteration by using the same command.
1

Try:

df_with_label = first_df.join(self.second_df, rsuffix = "_2") 

The thing is - df_with_label and second_df both have column 1, so the rsuffix will add "_2" to the second_df column name "1" := "1_2". You join on indexes, so every other column is shown on default - so you need to avoid naming conflicts.

REF https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.