Combine two columns of pandas dataframe

Question

I have 15 csv files whose one of the column represents year. Problem is that the year column is named 'year' in some files and 'year_' in the other. So I have two columns that have the same information to me but since each file has only one of the column name(either year or year_), if row 1 has value in 'year', 'year_' has NaN. I want to combine those two columns so that I can get rid of NaN. What is the best way to do this?

Before

 year year_ 1 NaN 1999 2 2002 NaN 3 2000 NaN . . . N NaN 2004

I want this to be

After

 year 1 1999 2 2002 3 2000 . . . N 2004

Spandan Brahmbhatt · Accepted Answer · 2017-12-13 21:36:54Z

4

You can use combine_first function.

df['YEAR'] = df['year'].combine_first(df['year_'])

where df['year'] will be default and df['year2'] will be used to fill null values.

edited Dec 13, 2017 at 21:36

answered Dec 13, 2017 at 21:31

Spandan Brahmbhatt

4,1146 gold badges27 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Cleb Over a year ago

Seems to be faster than the sum solution.

Cleb Over a year ago

@HirotakaNakagame: Glad that we could help. Please also upvote helpful answers and accept the one that helped you most to show others that your issue is solved (you can do so by clicking on the small check next to the answer which then turns green).

Vaishali · Accepted Answer · 2017-12-13 21:33:53Z

Given that only one has a valid value, you can simply sum them on axis 1

year_cols = df.columns[df.columns.str.contains('year')] df['year'] = df[year_cols].sum(1)

Cleb · Accepted Answer · 2017-12-13 22:25:42Z

Same idea as @Vaishali: you can just sum the year columns; use filter to select the columns:

df.filter(like='year').sum(axis=1)

Collectives™ on Stack Overflow

Combine two columns of pandas dataframe

3 Answers 3

2 Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Related