pandas complicated join operation

Question

I would like to implement a specific join operation with the following requirements:

I have a data frame in the following format, where the index is datetime and I have columns from 0 to N (9 in this example)

df1:

 0 1 2 3 4 5 6 7 8 9 2001-01-01 2 53 35 91 43 31 7 87 25 68 2001-01-02 12 97 86 59 51 7 75 25 6 40 2001-01-03 73 82 87 1 46 66 17 42 96 61

I also have another dataframe that contains the columns to be chosen for each datetime index, i.e. the values are 0 to N:

 0 2001-01-01 9 2001-01-02 5 2001-01-03 4

I would like to select the underlying values of the first dataframe, where

index df1 = index df2 columns df1 = value df2

For example the results for the above example should look like this:

join(df1,df2)= 0 2001-01-01 68 2001-01-02 7 2001-01-03 46

So, could there be some date indices missing in df2 that could be part of df1 or vice versa? If so, what must be the desired behavior then? Could you post a sample case for such a situation? — Divakar
– Divakar, Commented Jul 14, 2016 at 18:19

jezrael · Accepted Answer · 2016-07-14 18:24:25Z

You can use lookup:

print (df1.lookup(df1.index, df2.iloc[:,0])) [68 7 46] print (pd.DataFrame(df1.lookup(df1.index, df2.iloc[:,0]), index=df1.index)) 0 2001-01-01 68 2001-01-02 7 2001-01-03 46

Another solution with squeeze:

print (pd.DataFrame(df1.lookup(df1.index, df2.squeeze()), index=df1.index)) 0 2001-01-01 68 2001-01-02 7 2001-01-03 46

I would think this would be faster, as it avoids any conversion to array. And good to see a NumPy alternative for pandas to do such fancy indexing.

Divakar · Accepted Answer · 2016-07-14 18:11:39Z

3

Something along these lines taken from NumPy's indexing methods -

vals = df1.values[np.arange(df1.shape[0]),df2[0].values] df_out = pd.DataFrame(vals,index=df1.index)

edited Jul 14, 2016 at 18:11

answered Jul 14, 2016 at 18:06

Divakar

222k19 gold badges273 silver badges374 bronze badges

1 Comment

Divakar Over a year ago

@motam79 Also look into @ jezrael's solution as that might be faster.

Collectives™ on Stack Overflow

pandas complicated join operation

2 Answers 2

1 Comment

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Related