207

I have a data.frame in R. I want to try two different conditions on two different columns, but I want these conditions to be inclusive. Therefore, I would like to use "OR" to combine the conditions. I have used the following syntax before with lot of success when I wanted to use the "AND" condition.

my.data.frame <- data[(data$V1 > 2) & (data$V2 < 4), ] 

But I don't know how to use an 'OR' in the above.

0

5 Answers 5

289
my.data.frame <- subset(data , V1 > 2 | V2 < 4) 

An alternative solution that mimics the behavior of this function and would be more appropriate for inclusion within a function body:

new.data <- data[ which( data$V1 > 2 | data$V2 < 4) , ] 

Some people criticize the use of which as not needed, but it does prevent the NA values from throwing back unwanted results. The equivalent (.i.e not returning NA-rows for any NA's in V1 or V2) to the two options demonstrated above without the which would be:

 new.data <- data[ !is.na(data$V1 | data$V2) & ( data$V1 > 2 | data$V2 < 4) , ] 

Note: I want to thank the anonymous contributor that attempted to fix the error in the code immediately above, a fix that got rejected by the moderators. There was actually an additional error that I noticed when I was correcting the first one. The conditional clause that checks for NA values needs to be first if it is to be handled as I intended, since ...

> NA & 1 [1] NA > 0 & NA [1] FALSE 

Order of arguments may matter when using '&".

Sign up to request clarification or add additional context in comments.

11 Comments

This is the highest voted question and then one finds: stackoverflow.com/questions/9860090/…
The advantage is compactness and easy of comprehension. The disadvantage is lack of utility in function building tasks. If one wants to replicate this with [ one needs to wrap in which or use additional !is.na constraints.
Is the 'which' required and if not why do you use it?
It's not "required", but you may get a different result if you leave out the which. If both V1 and V2 are NA you would get a row of NA's at that position if you left out the which. I work with large datasets and even a relatively small percentage of NA's will really fill up my screen with junk output. Some people think this is a feature. I don't.
how do you include a call to grepl or grep with this to also do pattern matching for desired rows, in addition to these conditionals?
|
35

You are looking for "|." See http://cran.r-project.org/doc/manuals/R-intro.html#Logical-vectors

my.data.frame <- data[(data$V1 > 2) | (data$V2 < 4), ] 

1 Comment

This is NOT robust to the existence of NAs in a dataframe: vc <- data.frame(duzey=factor(c("Y","O","Y","D","Y","Y","O"), levels=c("D","O","Y"), ordered=TRUE), cinsiyet=c("E","E","K",NA,"K","E","K"), yas=c(8,3,9,NA,7,NA,6), Not=c(NA,1,1,NA,NA,2,1)); vc; vc[vc$cinsiyet == "E" | vc$Not < 4,]; vc[vc$cinsiyet == "E" & vc$Not < 2,]
19

Just for the sake of completeness, we can use the operators [ and [[:

set.seed(1) df <- data.frame(v1 = runif(10), v2 = letters[1:10]) 

Several options

df[df[1] < 0.5 | df[2] == "g", ] df[df[[1]] < 0.5 | df[[2]] == "g", ] df[df["v1"] < 0.5 | df["v2"] == "g", ] 

df$name is equivalent to df[["name", exact = FALSE]]

Using dplyr:

library(dplyr) filter(df, v1 < 0.5 | v2 == "g") 

Using sqldf:

library(sqldf) sqldf('SELECT * FROM df WHERE v1 < 0.5 OR v2 = "g"') 

Output for the above options:

 v1 v2 1 0.26550866 a 2 0.37212390 b 3 0.20168193 e 4 0.94467527 g 5 0.06178627 j 

2 Comments

how would you do this for 1 AND condition and 3 OR conditions contingent so for example: my.data.frame <- data[data$V3>10 & ((data$V1 > 2) | (data$V2 < 4) | (data$V4 <5), ]. When I do this it doesn't work
Wow! The sqldf package is too good. Very handy especially when subset() gets a bit painful :)
1

In case anyone is looking for a very scalable solution that is applicable if you want to test multiple columns for the same condition, you can use Reduce or rowSums.

Sample Data

df <- base::expand.grid(x = c(0, 1), y = c(0, 1), z = c(0, 1)) df #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1 #> 8 1 1 1 

Solution

Does it contain any 0? Keeps every row except for row 8 that is filled with 1 only.

The function + in Reduce() basically works as an OR operator since its result is >0 if it contains any TRUE value.

## Reduce --------------------------------------------------- df[Reduce(f = `+`, x = lapply(df, `==`, 0)) > 0, ] #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1 ## rowSums -------------------------------------------------- df[rowSums(df == 0) > 0, ] #> x y z #> 1 0 0 0 #> 2 1 0 0 #> 3 0 1 0 #> 4 1 1 0 #> 5 0 0 1 #> 6 1 0 1 #> 7 0 1 1 

Multiple AND-Conditions

Note that you can use Reduce also easily to apply multiple AND conditions by using * instead of +. Multiplying all logicals only returns a value >0 if all cases are TRUE.

df[Reduce(`*`, lapply(df, `==`, 0)) > 0, ] #> x y z #> 1 0 0 0 

Comments

0

A data.table option for completeness:

library(data.table) dt <- data.table(V1 = runif(10, 0, 1), V2 = letters[1:10]) dt[V1 > 0.5 | V2 == "b",] #> V1 V2 #> 1: 0.7294220 a #> 2: 0.9717687 b #> 3: 0.7177076 c #> 4: 0.5963838 e #> 5: 0.5456320 i 

Created on 2022-07-10 by the reprex package (v2.0.1)

For more info about this useful package check this link.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.