I am working in Python and have a large dataset in a pandas dataframe. I have taken a section of this data and put it into another dataframe, where I have created a new column and populated it. I now want to put this new column back into the original dataframe, overwriting one of the existing columns, but only for the section I have edited.
Please can you help advise how this is best done? The only unique identifier is the index that is automatically generated. The 2nd dataframe has kept the same index values as the larger one so it should be quite straight forward but I cannot work out how to a) reference the automatically created indexes b) use these indexes to overwrite the existing data in the column from another dataframe
So, it should be something like this (I realise this is a mashup of syntax but just trying to better explain what I am trying to do!):
where df1.ROW.INDEX == df2.ROW.INDEX insert into df1['col_name'].value from df2.['col_name'].value Any help would be greatly appreciated.
UPDATE: I now have this code which almost works:
index_values = edited_df.index.values for i in index_values: main_df.iloc[i]['pop'] = edited_df.iloc[i]['new_col'] I get a caveats error, and the main_df is not changed. It looks like it is making copies in each iteration rather than updating the main dataframe.
UPDATE: FIXED I finally managed to work out the kinks, solution below for anyone that has a similar problem.
index_values = edited_df.index.values for i in index_values: main_df.iloc[i, main_df.columns.get_loc('pop')] = edited_df.iloc[i]['new_col']