Python: pandas DataFrame new column based on other columns

Question

I have a df with 2 columns like below:

 A B 0 100-00 nan 1 200-00 nan 2 other 300-00 3 100-00 text 4 other nan

I need to create column C which will apply logic as below:

if B is nan, then A
if B starts with numeric, then B
else A

I've got code as below, which is working just fine, but I believe there might be a better, more efficent way to do it:

C = [] for r in range(df.shape[0]): if df['B'].iloc[r] == 'nan': C.append(df['A'].iloc[r]) elif df['B'].iloc[r][:3].isnumeric(): C.append(df['B'].iloc[r]) else: C.append(df['A'].iloc[r]) df['C'] = C df A B C 0 100-00 nan 100-00 1 200-00 nan 200-00 2 other 300-00 300-00 3 100-00 text 100-00 4 other nan other

Thanks in advance for all your help.

jezrael · Accepted Answer · 2020-04-27 09:48:31Z

I simplify solution for df.B if first character in df.B is numeric tested with Series.str.contains with regex ^ for start of string and \d for number in numpy.where:

df['C'] = np.where(df['B'].str.contains(r'^\d', na=False), df.B, df.A) #alternative #df['C'] = df.B.where(df['B'].str.contains(r'^\d', na=False), df.A) print (df) A B C 0 100-00 NaN 100-00 1 200-00 NaN 200-00 2 other 300-00 300-00 3 100-00 text 100-00 4 other NaN other

Pani · Accepted Answer · 2020-04-27 09:57:47Z

Not necessarily more efficient, but a more pythonic way to do it

import pandas as pd df = pd.DataFrame({'A': ['100-00', '200-00', 'other', '100-00', 'other'], 'B': ['nan', 'nan', '300-00', 'text', 'nan']}) def label_columnC(row): if row['B'] == 'nan': return row['A'] elif row['B'][:3].isnumeric(): return row['B'] else: return row['A'] df['C'] = df.apply(lambda row: label_columnC(row), axis=1)

Collectives™ on Stack Overflow

Python: pandas DataFrame new column based on other columns

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related