-1

I have two very large series that contain only join keys. Without using the index (meaningless in this case) I want to left join one series to another by the values in the most efficient way possible.

Right now, I add a column of 1's just so I can use pd.merge with a left join just so I can identify whether each key in left also exists in right.

I'm sure I can do this without creating the two unused columns but pd.concat seems to want to use indices for the join. Is there a way to left two series on values and is there a faster numpy version of this?

For example:

a = pd.Series([1,2,3]) b = pd.Series([1,3,6]) 

I want to return an array or Series that tells me if each value in a is in b in the most efficient way possible.

 [True, False, True] 
2
  • Can you add samples? Commented Jan 18, 2016 at 10:00
  • Example values are up. Commented Jan 18, 2016 at 10:10

1 Answer 1

2

You can try:

c = a.isin(b) 

that returns:

0 True 1 False 2 True dtype: bool 

or if you want an array you can just:

c.values 

that returns:

array([ True, False, True], dtype=bool) 
Sign up to request clarification or add additional context in comments.

1 Comment

Nice! Does this scale well? I have two 20MM record series...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.