pandas: combine two columns in a DataFrame

Question

I have a pandas DataFrame that has multiple columns in it:

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 Data columns: foo 11516 non-null values bar 228381 non-null values Time_UTC 239897 non-null values dtstamp 239897 non-null values dtypes: float64(4), object(1)

where foo and bar are columns which contain the same data yet are named differently. Is there are a way to move the rows which make up foo into bar, ideally whilst maintaining the name of bar?

In the end the DataFrame should appear as:

Index: 239897 entries, 2012-05-11 15:20:00 to 2012-06-02 23:44:51 Data columns: bar 239897 non-null values Time_UTC 239897 non-null values dtstamp 239897 non-null values dtypes: float64(4), object(1)

That is the NaN values that made up bar were replaced by the values from foo.

BrenBarn · Accepted Answer · 2012-06-10 21:38:40Z

22

Try this:

pandas.concat([df['foo'].dropna(), df['bar'].dropna()]).reindex_like(df)

If you want that data to become the new column bar, just assign the result to df['bar'].

answered Jun 10, 2012 at 21:38

BrenBarn

253k39 gold badges421 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BFTM Over a year ago

I am not seeing concat as a function in the pandas namespace; I'm not sure what I am missing.

BrenBarn Over a year ago

What version of pandas do you have? The function is documented here: pandas.pydata.org/pandas-docs/stable/…

BFTM Over a year ago

I was running pandas ver 0.6.1 which doesn't have the concat function included. An upgrade to v 0.7.3 brings concat into the namespace. Works like a charm! Thanks.

user1883737 · Accepted Answer · 2014-05-21 15:56:05Z

you can use directly fillna and assigning the result to the column 'bar'

df['bar'].fillna(df['foo'], inplace=True) del df['foo']

general example:

import pandas as pd #creating the table with two missing values df1 = pd.DataFrame({'a':[1,2],'b':[3,4]}, index = [1,2]) df2 = pd.DataFrame({'b':[5,6]}, index = [3,4]) dftot = pd.concat((df1, df2)) print dftot #creating the dataframe to fill the missing values filldf = pd.DataFrame({'a':[7,7,7,7]}) #filling print dftot.fillna(filldf)

but do note that since filldf is indexed 0..3 while dftot is indexed 1..4, dftot.fillna(filldf)['a'][4] will be nan. not 7.0

dagrha · Accepted Answer · 2016-11-30 00:57:03Z

More modern pandas versions (since at least 0.12) have the combine_first() and update() methods for DataFrame and Series objects. For example if your DataFrame were called df, you would do:

df.bar.combine_first(df.foo)

which would only alter Nan values of the bar column to match the foo column, and would do so inplace. To overwrite non-Nan values in bar with those in foo, you would use the update() method.

van_d39 · Accepted Answer · 2016-11-30 00:52:06Z

Another option, use the .apply() method on the frame. You can do reassign a column with deference to existing data...

import pandas as pd import numpy as np # get your data into a dataframe # replace content in "bar" with "foo" if "bar" is null df["bar"] = df.apply(lambda row: row["foo"] if row["bar"] == np.NaN else row["bar"], axis=1) # note: change 'np.NaN' with null values you have like an empty string

van_d39 · Accepted Answer · 2016-12-01 03:51:41Z

You can do this using numpy too.

df['bar'] = np.where(pd.isnull(df['bar']),df['foo'],df['bar'])

Collectives™ on Stack Overflow

pandas: combine two columns in a DataFrame

5 Answers 5

3 Comments

1 Comment

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

Comments

1 Comment

Comments

Linked

Related