find position of column string in another column using Pandas

Question

I have a Dataframe with 2 columns

 col1 col2 1 cat the cat 2 dog a nice dog 3 horse horse is here

I need to find the position of each string of col1 in col2.

Solution must be:

 col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0

There must be a simple solution to do this without using painful loops, but i can't find it.

piRSquared · Accepted Answer · 2018-10-11 16:46:41Z

8

`numpy.core.defchararray.find`

from numpy.core.defchararray import find a = df.col2.values.astype(str) b = df.col1.values.astype(str) df.assign(col3=find(a, b)) col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0

answered Oct 11, 2018 at 16:46

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mohamed AL ANI Over a year ago

Thanks, this seems to be what I am looking for. Do you know if there is a Pandas equivalent?

piRSquared Over a year ago

The rough equivalent is df.col2.str.find('cat') but it fails to do pair-wise find.

sacuL · Accepted Answer · 2018-10-11 16:44:50Z

5

In pandas when working with strings, often loops or list comprehensions will be faster than the built-in string methods. In your case, it can be a pretty short one:

df['col3'] = [i2.index(i1) for i1,i2 in zip(df.col1,df.col2)] >>> df col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0

answered Oct 11, 2018 at 16:44

sacuL

51.6k9 gold badges88 silver badges115 bronze badges

3 Comments

Mohamed AL ANI Over a year ago

My DataFrame has 1 million rows and my strings can be 50000 char length. I don't really want to try this. Pandas/Numpy functions are made to speed up this kind of heavy thing

sacuL Over a year ago

That's true that generally pandas and numpy functions are made to speed things up, but in the case of strings, the methods provided often fail to provide speed boosts. I haven't timed it with a big dataframe of large strings in your case, but it may not be as slow as you expect.

Jan Wilamowski Over a year ago

To my surprise, this did run about as the numpy solution on my dataset with 5M entries and string lengths of 1-39 (substrings) and 100-150 (search target).

Collectives™ on Stack Overflow

find position of column string in another column using Pandas

2 Answers 2

`numpy.core.defchararray.find`

2 Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Related