Determining if a column value is between a conditional range based on another column

Question

I have a dataframe that looks as follows:

 data = np.array([[5, 'red', 2,6, 8, 10], [11, 'red', 3,9,6,15], [8, 'blue', 0, 3, 5, 10], [2, 'blue', 1, 2, 3, 4]]) df = pd.DataFrame(data, columns = ['A','B','red_lower', 'red_upper', 'blue_lower', 'blue_upper'])

 A B red_lower red_upper blue_lower blue_upper 0 5 red 2 6 8 10 1 11 red 3 9 6 15 2 8 blue 0 3 5 10 3 2 blue 1 2 3 4

I'd like to create an additional column that tells me if the value in a column A is in the range of the color given in column B. For example, in row 0, since 5 has the designation red, I will check if 5 is between 2 and 6. It is, so I will have the new column have a 1.

Desired result:

 A B red_lower red_upper blue_lower blue_upper in_range 0 5 red 2 6 8 10 1 1 11 red 3 9 6 15 0 2 8 blue 0 3 5 10 1 3 2 blue 1 2 3 4 0

I've tried to write a loop, but I'm getting many series errors. I really dont want to have to split up the dataframe (by color), but maybe that's the way to go? (in my actual dataframe, there are six different 'colors', not just two).

Thank you!

EDIT: bonus if we have the additional column tell me if the value is above or below the range! For example, in row 1, 11 is outside the range, so is too high. Table should look this way:

 A B red_lower red_upper blue_lower blue_upper in_range 0 5 red 2 6 8 10 inside 1 11 red 3 9 6 15 above 2 8 blue 0 3 5 10 inside 3 2 blue 1 2 3 4 below

for loop or groupby should be the way

BENY
– BENY

2019-07-08 21:08:45 +00:00
Commented Jul 8, 2019 at 21:08 — BENY
– BENY, Commented Jul 8, 2019 at 21:08

BENY · Accepted Answer · 2019-07-08 21:22:39Z

3

Here is using groupby split the df and most of step handled by the definition , which means you do not need input the different color each time

l=[] for name,x in df.groupby('B',sort=False): s1=(x.A >= x.filter(like=name).iloc[:, 0]) & (x.A <= x.filter(like=name).iloc[:, 1]) s2=x.A<x.filter(like=name).iloc[:, 0] l.extend(np.select([s1,s2],['inside','below'],default='above').tolist()) df['in_range']=l df Out[64]: A B red_lower red_upper blue_lower blue_upper in_range 0 5 red 2 6 8 10 inside 1 11 red 3 9 6 15 above 2 8 blue 0 3 5 10 inside 3 2 blue 1 2 3 4 below

answered Jul 8, 2019 at 21:22

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

shakebeforeopening Over a year ago

I think the issue here is my 'color' strings might not show up exactly in the column name - there are extra dashes between words, and there are capitalization issues, so the filter isn't perfect. When running this, I get a 'single positional indexer is out-of-bounds' error. I'll try to play with it to edit my column names and B entries to make them match up nicer perhaps?

shakebeforeopening Over a year ago

This needed lots of tweaking for the mess of data I'm working with, but this was really helpful, thank you!

shakebeforeopening Over a year ago

I want to edit this because there is a problem in the code here. This base code DOES NOT WORK if the colors are mixed. If this table is changed to red/blue/red/blue, the designations are off. I'm going to uncheck this as a solution for now - any thoughts @WeNYoBen ?

BENY Over a year ago

@shakebeforeopening then you should sort your columns first , then after assign it back , you can always sort_index get back the original ordered ..My code work , just need to have more effort on it, Likedf=df.sort_values('B'), then using above code , then df=df.sort_index()

user3483203 · Accepted Answer · 2019-07-08 21:39:15Z

`justify` + `broadcast` + `mask` + `logical_and`

You can use some nifty broadcasting here, and the function justify from another answer. This assumes that each color has a single valid range. It also assumes that all of your numeric columns are in fact numeric.

values = df.A.values colors = df.B.values range_frame = df.iloc[:, 2:] ranges = range_frame.columns.str.split('_').str[0].values m = colors != ranges[:, None] masked = range_frame.mask(m) jf = justify(masked.values, invalid_val=np.nan)[:, :2] ir = np.logical_and(jf[:, 0] < values, values < jf[:, 1]).astype(int) c1 = values <= jf[:, 0] c2 = values >= jf[:, 1] irl = np.select([c1, c2], ['below', 'above'], 'inside') df.assign(in_range=ir, in_range_flag=irl)

 A B red_lower red_upper blue_lower blue_upper in_range in_range_flag 0 5 red 2 6 8 10 1 inside 1 11 red 3 9 6 15 0 above 2 8 blue 0 3 5 10 1 inside 3 3 blue 1 2 3 4 0 below

`stack` + `reshape` + `logical_and`

Again making the same assumptions as the first answer.

u = df.set_index(['A', 'B']).stack().rename_axis(['A', 'B', 'flag']).reset_index() frame = u[u.flag.str.split('_').str[0] == u.B] values = frame[::2].A.values ranges = frame[0].values.reshape(-1, 2) ir = np.logical_and(ranges[:, 0] < values, values < ranges[:, 1]) c1 = values <= ranges[:, 0] c2 = values >= ranges[:, 1] irl = np.select([c1, c2], ['below', 'above'], 'inside') df.assign(in_range=ir, in_range_flag=irl)

Here is the definition for the justify function by @Divakar:

def justify(a, invalid_val=0, axis=1, side='left'): """ Justifies a 2D array Parameters ---------- A : ndarray Input array to be justified axis : int Axis along which justification is to be made side : str Direction of justification. It could be 'left', 'right', 'up', 'down' It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0. """ if invalid_val is np.nan: mask = ~np.isnan(a) else: mask = a!=invalid_val justified_mask = np.sort(mask,axis=axis) if (side=='up') | (side=='left'): justified_mask = np.flip(justified_mask,axis=axis) out = np.full(a.shape, invalid_val) if axis==1: out[justified_mask] = a[mask] else: out.T[justified_mask.T] = a.T[mask.T] return out

I definitely should have mentioned that I have some empty strings (not NULLS) in certain ranges where they aren't provided (it's not the cleanest data in the world). Each color does match up to a 'range', but that range may be empty. I could amend this code with an if statement that checks if the range is empty first to put the string 'not provided' in the additional column? But I'm not sure how that would affect the justify function.

Collectives™ on Stack Overflow

Determining if a column value is between a conditional range based on another column

2 Answers 2

4 Comments

`justify` + `broadcast` + `mask` + `logical_and`

`stack` + `reshape` + `logical_and`

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

justify + broadcast + mask + logical_and

stack + reshape + logical_and

1 Comment

Linked

Related

`justify` + `broadcast` + `mask` + `logical_and`

`stack` + `reshape` + `logical_and`