3

So I have two data frames. Df1 has names and values as follows:

Source Target volume
dog cat 3
cat rat 2

Df2 has range for each name as follows:

name min max
dog 1 2
cat 1 3

what I would like to achieve is to check if the values in source or target columns in df1 exist in name column in df2, and if they exist to check if the volume is between them min and max values and result a data frame as shown below:

Source Target volume S. in range S. min S. max T. in range T. min T.max
dog cat 3 No 1 2 Yes 1 2
cat rat 2 yes 1 3 - - -

If the name does no exist in df2 there should be a line or similar indicating that the name was not found.

Im not entirely sure how to get this work as I need to check for values from two columns and combine them into one data frame.

3
  • Should the column values in Df1 be volume instead ? Commented Jun 22, 2021 at 6:23
  • @SeaBean - I think yes, so edited question. Commented Jun 22, 2021 at 6:24
  • @jezrael Excellent! Commented Jun 22, 2021 at 6:25

2 Answers 2

1

Use DataFrame.merge with left join and rename columns for new DataFrames, then join in concat and set in range columns by numpy.select for also test if no match, so assigned Nones and Series.between:

df3 = (df1[['Source','volume']].rename(columns={'Source':'in range'}) .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left')) df4 = (df1[['Target','volume']].rename(columns={'Target':'in range'}) .merge(df2.rename(columns={'name':'in range'}), on='in range', how='left')) df = pd.concat([df1, df3.drop('volume',axis=1).add_prefix('S.'), df4.drop('volume',axis=1).add_prefix('T.')], axis=1) m10 = df['S.min'].isna() m11 = df['volume'].between(df['S.min'], df['S.max']) m20 = df['T.min'].isna() m21 = df['volume'].between(df['T.min'], df['T.max']) df['S.in range'] = np.select([m10, m11], [None, 'Yes'], 'No') df['T.in range'] = np.select([m20, m21], [None, 'Yes'], 'No') print (df) Source Target volume S.in range S.min S.max T.in range T.min T.max 0 dog cat 3 No 1 2 Yes 1.0 3.0 1 cat rat 2 Yes 1 3 None NaN NaN 
Sign up to request clarification or add additional context in comments.

Comments

1

You can use .set_index() on df2 to make it a lookup table by name. Then use .map() to map the Source and Target to get the min, max values:

df2a = df2.set_index('name') df3 = df1.copy() df3['S.in range'] = np.where(df3['Source'].isin(df2['name']), df3['volume'].between(df3['Source'].map(df2a['min']), df3['Source'].map(df2a['max'])), None) df3['S.min'] = df3['Source'].map(df2a['min']) df3['S.max'] = df3['Source'].map(df2a['max']) df3['T.in range'] = np.where(df3['Target'].isin(df2['name']), df3['volume'].between(df3['Target'].map(df2a['min']), df3['Target'].map(df2a['max'])), None) df3['T.min'] = df3['Target'].map(df2a['min']) df3['T.max'] = df3['Target'].map(df2a['max']) df3['S.in range'] = df3['S.in range'].replace({False: 'No', True: 'Yes'}) df3['T.in range'] = df3['T.in range'].replace({False: 'No', True: 'Yes'}) 

Result:

print(df3) Source Target volume S.in range S.min S.max T.in range T.min T.max 0 dog cat 3 No 1 2 Yes 1.0 3.0 1 cat rat 2 Yes 1 3 None NaN NaN 

(I suppose the column values should be named volume according to description and expected result. If not the case, can use values in the codes above and rename it if necessary for desired output).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.