How to use indexing by matching strings in data frame in pandas

Question

I try to resolve the following problem. I have two data sets, say df1 and df2:

df1 NameSP Val Char1 BVA 0 'ACCR' 0.091941 A Y' 1 'SDRE' 0.001395 S Y' 2 'ACUZ' 0.121183 A N' 3 'SRRE' 0.001512 S N' 4 'FFTR' 0.035609 F N' 5 'STZE' 0.000637 S N' 6 'AHZR' 0.001418 A Y' 7 'DEES' 0.000876 D N' 8 'UURR' 0.023878 U Y' 9 'LLOH' 0.004371 L Y' 10 'IUUT' 0.049102 I N' df2 NameSP Val1 Glob 0 'ACCR' 0.234 20000 1 'FFTR' 0.222 10000 2 'STZE' 0.001 5000 3 'DEES' 0.006 2000 4 'UURR' 0.134 20000 5 'LLOH' 0.034 10000

I would like to perform indexing of df2 in df1, and then use the indexing vector for various matrix operation. This would be something similar to strmatch(A,B,'exact') in Matlab. I can get the indexing properly by using .iloc and then .isin as in the following code:

import pandas as pd import numpy as np df1 = pd.read_excel('C:\PYTHONCODES\LINEAROPT\TEST_DATA1.xlsx') df2 = pd.read_excel('C:\PYTHONCODES\LINEAROPT\TEST_DATA2.xlsx') print(df1) print(df2) ddf1 = df1.iloc[:,0] ddf2 = df2.iloc[:,0] pindex = ddf1[ddf1.isin(ddf2)] print(pindex.index)

which gives me:

Int64Index([0, 4, 5, 7, 8, 9], dtype='int64')

But I can not find the way to use this index for mapping and building my arrays. As an example, I would like to have a vector that has the same number of elements that df1, but with Val1 values from df2 at indexed positions and zeros everywhere else. So it should look like that:

0.234 0 0 0 0.222 0.001 0 0.006 0.134 0.034 0

Or another mapping problem. How to use such indexing to map the values from colon "Val" in df1 in a vector that would contain Val from df1 at indexed rows and zeros everywhere else. So this time it should look like:

0.091941 0.0 0.0 0.0 0.035609 0.000637 0.0 0.000876 0.023878 0.004371 0.0

Any idea of how to that in efficient and elegant way?

Thanks for help!

ansev · Accepted Answer · 2020-04-15 17:02:21Z

1

First problem

df2.set_index('NameSP')['Val1'].reindex(df1['NameSP']).fillna(0)

Second problem

df1['Val1'].where(df1['NameSP'].isin(df2['NameSP']), 0)

answered Apr 15, 2020 at 17:02

ansev

31k5 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Chem1967 Over a year ago

Fantastic! This works great for the first problem. Thanks. For the second one it's perfect too. Thanks so much!

ansev Over a year ago

len of df1 or df2 for second problem?

Collectives™ on Stack Overflow

How to use indexing by matching strings in data frame in pandas

1 Answer 1

2 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Related