Pandas check if cell value is between range from another data frame

Question

So I have two data frames. Df1 has names and values as follows:

Source	Target	volume
dog	cat	3
cat	rat	2

Df2 has range for each name as follows:

name	min	max
dog	1	2
cat	1	3

what I would like to achieve is to check if the values in source or target columns in df1 exist in name column in df2, and if they exist to check if the volume is between them min and max values and result a data frame as shown below:

Source	Target	volume	S. in range	S. min	S. max	T. in range	T. min	T.max
dog	cat	3	No	1	2	Yes	1	2
cat	rat	2	yes	1	3	-	-	-

If the name does no exist in df2 there should be a line or similar indicating that the name was not found.

Im not entirely sure how to get this work as I need to check for values from two columns and combine them into one data frame.

Should the column values in Df1 be volume instead ?

SeaBean
– SeaBean

2021-06-22 06:23:47 +00:00
Commented Jun 22, 2021 at 6:23 — SeaBean
– SeaBean, Commented Jun 22, 2021 at 6:23
@SeaBean - I think yes, so edited question.

jezrael
– jezrael

2021-06-22 06:24:26 +00:00
Commented Jun 22, 2021 at 6:24 — jezrael
– jezrael, Commented Jun 22, 2021 at 6:24
@jezrael Excellent!

SeaBean
– SeaBean

2021-06-22 06:25:08 +00:00
Commented Jun 22, 2021 at 6:25 — SeaBean
– SeaBean, Commented Jun 22, 2021 at 6:25

jezrael · Accepted Answer · 2021-06-22 06:22:13Z

Use DataFrame.merge with left join and rename columns for new DataFrames, then join in concat and set in range columns by numpy.select for also test if no match, so assigned Nones and Series.between:

df3 = (df1[['Source','volume']].rename(columns={'Source':'in range'}) .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left')) df4 = (df1[['Target','volume']].rename(columns={'Target':'in range'}) .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left')) df = pd.concat([df1, df3.drop('volume',axis=1).add_prefix('S.'), df4.drop('volume',axis=1).add_prefix('T.')], axis=1) m10 = df['S.min'].isna() m11 = df['volume'].between(df['S.min'], df['S.max']) m20 = df['T.min'].isna() m21 = df['volume'].between(df['T.min'], df['T.max']) df['S.in range'] = np.select([m10, m11], [None, 'Yes'], 'No') df['T.in range'] = np.select([m20, m21], [None, 'Yes'], 'No') print (df) Source Target volume S.in range S.min S.max T.in range T.min T.max 0 dog cat 3 No 1 2 Yes 1.0 3.0 1 cat rat 2 Yes 1 3 None NaN NaN

SeaBean · Accepted Answer · 2021-06-22 07:12:25Z

You can use .set_index() on df2 to make it a lookup table by name. Then use .map() to map the Source and Target to get the min, max values:

df2a = df2.set_index('name') df3 = df1.copy() df3['S.in range'] = np.where(df3['Source'].isin(df2['name']), df3['volume'].between(df3['Source'].map(df2a['min']), df3['Source'].map(df2a['max'])), None) df3['S.min'] = df3['Source'].map(df2a['min']) df3['S.max'] = df3['Source'].map(df2a['max']) df3['T.in range'] = np.where(df3['Target'].isin(df2['name']), df3['volume'].between(df3['Target'].map(df2a['min']), df3['Target'].map(df2a['max'])), None) df3['T.min'] = df3['Target'].map(df2a['min']) df3['T.max'] = df3['Target'].map(df2a['max']) df3['S.in range'] = df3['S.in range'].replace({False: 'No', True: 'Yes'}) df3['T.in range'] = df3['T.in range'].replace({False: 'No', True: 'Yes'})

Result:

print(df3) Source Target volume S.in range S.min S.max T.in range T.min T.max 0 dog cat 3 No 1 2 Yes 1.0 3.0 1 cat rat 2 Yes 1 3 None NaN NaN

(I suppose the column values should be named volume according to description and expected result. If not the case, can use values in the codes above and rename it if necessary for desired output).

Collectives™ on Stack Overflow

Pandas check if cell value is between range from another data frame

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related