Skip to main content
edited body
Source Link

I have a dataframe like this:

id year data_1 data_2
A 2019 nan 11
A 2019 123abc 11
A 2020 nan 22
B 2019 345 nan
B 2019 nan 456
B 2020 234 33

I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:

id year data_1 data_2
A 2019 123abc 11
A 2020 nan 22
B 2019 345 456
B 2020 234 33

I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.

I have a dataframe like this:

id year data_1 data_2
A 2019 nan 11
A 2019 123 11
A 2020 nan 22
B 2019 345 nan
B 2019 nan 456
B 2020 234 33

I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:

id year data_1 data_2
A 2019 123 11
A 2020 nan 22
B 2019 345 456
B 2020 234 33

I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.

I have a dataframe like this:

id year data_1 data_2
A 2019 nan 11
A 2019 abc 11
A 2020 nan 22
B 2019 345 nan
B 2019 nan 456
B 2020 234 33

I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:

id year data_1 data_2
A 2019 abc 11
A 2020 nan 22
B 2019 345 456
B 2020 234 33

I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.

edited title
Link

python/dataframe - meremerge duplicated rows

Source Link

python/dataframe - mere duplicated rows

I have a dataframe like this:

id year data_1 data_2
A 2019 nan 11
A 2019 123 11
A 2020 nan 22
B 2019 345 nan
B 2019 nan 456
B 2020 234 33

I want to identify duplicated rows based on some columns ("id" and "year" in this case) and merge the rest columns of them i.e. for a columns of an id at a year, keep the non-np.nan value:

id year data_1 data_2
A 2019 123 11
A 2020 nan 22
B 2019 345 456
B 2020 234 33

I can find all duplicated rows (which is easy) but can't think of how to "merge" by replacing np.nan with values.