Select rows where value of column A starts with value of column B

Question

I have a pandas dataframe and want to select rows where values of a column starts with values of another column. I have tried the following:

import pandas as pd df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'], 'B': ['app', 'b', 'aa']}) df_subset = df[df['A'].str.startswith(df['B'])]

But it errors out and this solutions that I found also have not been helping.

KeyError: "None of [Float64Index([nan, nan, nan], dtype='float64')] are in the [columns]"

np.where(df['A'].str.startswith(df['B']), True, False) from here also returns True for all.

Erfan · Accepted Answer · 2020-06-23 13:56:48Z

For row wise comparison, we can use DataFrame.apply:

m = df.apply(lambda x: x['A'].startswith(x['B']), axis=1) df[m] A B 0 apple app 2 aa aa

The reason your code is not working is because Series.str.startswith accepts a character sequence (a string scalar), and you are using a pandas Series. Quoting the docs:

pat : str
Character sequence. Regular expressions are not accepted.

Brilliant! I did try apply with lambda too but failed to get it work; was missing the axis=1.
Yes that can be confusing at start, basically the idea is that you want to apply your function on each row (so over the column axis) and not per column (which is the index axis). In this axis='columns' would also suffice.

BENY · Accepted Answer · 2020-06-23 13:45:32Z

You may need to do with for loop , since the row check is not support with str.startswith

[x.startswith(y) for x , y in zip(df.A,df.B)] Out[380]: [True, False, True] df_sub=df[[x.startswith(y) for x , y in zip(df.A,df.B)]].copy()

Woods Chen · Accepted Answer · 2020-06-23 14:00:50Z

You can achieve this without using for loop:

import pandas as pd import numpy as np df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'], 'B': ['app', 'b', 'aa']}) ufunc = np.frompyfunc(str.startswith, 2, 1) idx = ufunc(df['A'], df['B']) df[idx] Out[22]: A B 0 apple app 2 aa aa

Collectives™ on Stack Overflow

Select rows where value of column A starts with value of column B

3 Answers 3

2 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Linked

Related