How to combine multiple conditions to subset a data-frame using "OR"?

Question

I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.

my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ]

But I don't know how to use an 'OR' in the above.

IRTFM · Accepted Answer · 2017-07-07 16:08:19Z

my.data.frame <- subset(data , V1 > 2 | V2 < 4)

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ]

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ]

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1 [1] NA > 0 & NA [1] FALSE

Order of arguments may matter when using '&".

This is the highest voted question and then one finds: stackoverflow.com/questions/9860090/…
The advantage is compactness and easy of comprehension. The disadvantage is lack of utility in function building tasks. If one wants to replicate this with [ one needs to wrap in which or use additional !is.na constraints.
It's not "required", but you may get a different result if you leave out the which. If both V1 and V2 are NA you would get a row of NA's at that position if you left out the which. I work with large datasets and even a relatively small percentage of NA's will really fill up my screen with junk output. Some people think this is a feature. I don't.
how do you include a call to grepl or grep with this to also do pattern matching for desired rows, in addition to these conditionals?

ncray · Accepted Answer · 2011-02-08 16:21:53Z

35

You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors

my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ]

answered Feb 8, 2011 at 16:21

ncray

1,0709 silver badges4 bronze badges

1 Comment

Erdogan CEVHER Over a year ago

This is NOT robust to the existence of NAs in a dataframe:

vc <- data.frame(duzey=factor(c("Y","O","Y","D","Y","Y","O"), levels=c("D","O","Y"), ordered=TRUE), cinsiyet=c("E","E","K",NA,"K","E","K"), yas=c(8,3,9,NA,7,NA,6), Not=c(NA,1,1,NA,NA,2,1)); vc; vc[vc$cinsiyet == "E" | vc$Not < 4,]; vc[vc$cinsiyet == "E" & vc$Not < 2,]

mpalanco · Accepted Answer · 2015-06-24 10:19:31Z

Just for the sake of completeness, we can use the operators [ and [[:

set.seed(1) df <- data.frame(v1 = runif(10), v2 = letters[1:10])

Several options

df[df[1] < 0.5 | df[2] == "g", ] df[df[[1]] < 0.5 | df[[2]] == "g", ] df[df["v1"] < 0.5 | df["v2"] == "g", ]

df$name is equivalent to df[["name", exact = FALSE]]

Using dplyr:

library(dplyr) filter(df, v1 < 0.5 | v2 == "g")

Using sqldf:

library(sqldf) sqldf('SELECT * FROM df WHERE v1 < 0.5 OR v2 = "g"')

Output for the above options:

 v1 v2 1 0.26550866 a 2 0.37212390 b 3 0.20168193 e 4 0.94467527 g 5 0.06178627 j

how would you do this for 1 AND condition and 3 OR conditions contingent so for example: my.data.frame <- data[data$V3>10 & ((data$V1 > 2) | (data$V2 < 4) | (data$V4 <5), ]. When I do this it doesn't work
Wow! The sqldf package is too good. Very handy especially when subset() gets a bit painful :)

mnist · Accepted Answer · 2022-10-12 14:49:57Z

In case anyone is looking for a very scalable solution that is applicable if you want to test multiple columns for the same condition, you can use Reduce or rowSums.

Sample Data

df <- base::expand.grid(x = c(0, 1), y = c(0, 1), z = c(0, 1)) df #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1 #> 8 1 1 1

Solution

Does it contain any 0? Keeps every row except for row 8 that is filled with 1 only.

The function + in Reduce() basically works as an OR operator since its result is >0 if it contains any TRUE value.

## Reduce --------------------------------------------------- df[Reduce(f = `+`, x = lapply(df, `==`, 0)) > 0, ] #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1 ## rowSums -------------------------------------------------- df[rowSums(df == 0) > 0, ] #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1

Multiple AND-Conditions

Note that you can use Reduce also easily to apply multiple AND conditions by using * instead of +. Multiplying all logicals only returns a value >0 if all cases are TRUE.

df[Reduce(`*`, lapply(df, `==`, 0)) > 0, ] #> x y z #> 1 0 0 0

Julian · Accepted Answer · 2022-10-12 14:18:28Z

A data.table option for completeness:

library(data.table) dt <- data.table(V1 = runif(10, 0, 1), V2 = letters[1:10]) dt[V1 > 0.5 | V2 == "b",] #> V1 V2 #> 1: 0.7294220 a #> 2: 0.9717687 b #> 3: 0.7177076 c #> 4: 0.5963838 e #> 5: 0.5456320 i

^{Created on 2022-07-10 by the reprex package (v2.0.1)}

For more info about this useful package check this link.

Collectives™ on Stack Overflow

How to combine multiple conditions to subset a data-frame using "OR"?

5 Answers 5

11 Comments

1 Comment

2 Comments

Sample Data

Solution

Multiple AND-Conditions

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

11 Comments

1 Comment

2 Comments

Sample Data

Solution

Multiple AND-Conditions

Comments

Comments

Linked

Related