0

I want to create two binary indicators by checking to see if the characters in the first and third positions for column 'A' matches the characters found in the first and third positions of column 'B'.

Here is a sample data frame:

df = pd.DataFrame({'A' : ['a%d', 'a%', 'i%'], 'B' : ['and', 'as', 'if']}) A B 0 a%d and 1 a% as 2 i% if 

I would like the data frame to look like below:

 A B Match_1 Match_3 0 a%d and 1 1 1 a% as 1 0 2 i% if 1 0 

I tried using the following string comparison, but it the column just returns '0' values for the match_1 column.

df['match_1'] = np.where(df['A'][0] == df['B'][0], 1, 0) 

I am wondering if there is a function that is similar to the substr function found in SQL.

1
  • Try pd.Series.str.get, e.g. np.where(df['A'].str.get(0) == df['B'].str.get(0), 1, 0) Commented Oct 26, 2021 at 15:41

1 Answer 1

1

You could use pandas str method, that can work to slice the elements:

df['match_1'] = df['A'].str[0].eq(df['B'].str[0]).astype(int) df['match_3'] = df['A'].str[2].eq(df['B'].str[2]).astype(int) 

output:

 A B match_1 match_3 0 a%d and 1 1 1 a% as 1 0 2 i% if 1 0 

If you have many positions to test, you can use a loop:

for pos in (1, 3): df['match_%d' % pos] = df['A'].str[pos-1].eq(df['B'].str[pos-1]).astype(int) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.