5

I have a Dataframe with 2 columns

 col1 col2 1 cat the cat 2 dog a nice dog 3 horse horse is here 

I need to find the position of each string of col1 in col2.

Solution must be:

 col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0 

There must be a simple solution to do this without using painful loops, but i can't find it.

2 Answers 2

8

numpy.core.defchararray.find

from numpy.core.defchararray import find a = df.col2.values.astype(str) b = df.col1.values.astype(str) df.assign(col3=find(a, b)) col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, this seems to be what I am looking for. Do you know if there is a Pandas equivalent?
The rough equivalent is df.col2.str.find('cat') but it fails to do pair-wise find.
5

In pandas when working with strings, often loops or list comprehensions will be faster than the built-in string methods. In your case, it can be a pretty short one:

df['col3'] = [i2.index(i1) for i1,i2 in zip(df.col1,df.col2)] >>> df col1 col2 col3 1 cat the cat 4 2 dog a nice dog 7 3 horse horse is here 0 

3 Comments

My DataFrame has 1 million rows and my strings can be 50000 char length. I don't really want to try this. Pandas/Numpy functions are made to speed up this kind of heavy thing
That's true that generally pandas and numpy functions are made to speed things up, but in the case of strings, the methods provided often fail to provide speed boosts. I haven't timed it with a big dataframe of large strings in your case, but it may not be as slow as you expect.
To my surprise, this did run about as the numpy solution on my dataset with 5M entries and string lengths of 1-39 (substrings) and 100-150 (search target).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.