1

I have a df with 2 columns like below:

 A B 0 100-00 nan 1 200-00 nan 2 other 300-00 3 100-00 text 4 other nan 

I need to create column C which will apply logic as below:

  • if B is nan, then A
  • if B starts with numeric, then B
  • else A

I've got code as below, which is working just fine, but I believe there might be a better, more efficent way to do it:

C = [] for r in range(df.shape[0]): if df['B'].iloc[r] == 'nan': C.append(df['A'].iloc[r]) elif df['B'].iloc[r][:3].isnumeric(): C.append(df['B'].iloc[r]) else: C.append(df['A'].iloc[r]) df['C'] = C df A B C 0 100-00 nan 100-00 1 200-00 nan 200-00 2 other 300-00 300-00 3 100-00 text 100-00 4 other nan other 

Thanks in advance for all your help.

2 Answers 2

2

I simplify solution for df.B if first character in df.B is numeric tested with Series.str.contains with regex ^ for start of string and \d for number in numpy.where:

df['C'] = np.where(df['B'].str.contains(r'^\d', na=False), df.B, df.A) #alternative #df['C'] = df.B.where(df['B'].str.contains(r'^\d', na=False), df.A) print (df) A B C 0 100-00 NaN 100-00 1 200-00 NaN 200-00 2 other 300-00 300-00 3 100-00 text 100-00 4 other NaN other 
Sign up to request clarification or add additional context in comments.

Comments

1

Not necessarily more efficient, but a more pythonic way to do it

import pandas as pd df = pd.DataFrame({'A': ['100-00', '200-00', 'other', '100-00', 'other'], 'B': ['nan', 'nan', '300-00', 'text', 'nan']}) def label_columnC(row): if row['B'] == 'nan': return row['A'] elif row['B'][:3].isnumeric(): return row['B'] else: return row['A'] df['C'] = df.apply(lambda row: label_columnC(row), axis=1) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.