grepl("instance|percentage", labelTest$Text) will return true if any one of instance or percentage is present.
How will I get true only when both the terms are present?
Text <- c("instance", "percentage", "n", "instance percentage", "percentage instance") grepl("instance|percentage", Text) # TRUE TRUE FALSE TRUE TRUE grepl("instance.*percentage|percentage.*instance", Text) # FALSE FALSE FALSE TRUE TRUE The latter one works by looking for:
('instance')(any character sequence)('percentage') OR ('percentage')(any character sequence)('instance') Naturally if you need to find any combination of more than two words, this will get pretty complicated. Then the solution mentioned in the comments would be easier to implement and read.
Another alternative that might be relevant when matching many words is to use positive look-ahead (can be thought of as a 'non-consuming' match). For this you have to activate perl regex.
# create a vector of word combinations set.seed(1) words <- c("instance", "percentage", "element", "character", "n", "o", "p") Text2 <- replicate(10, paste(sample(words, 5), collapse=" ")) # grepl with multiple positive look-ahead longperl <- grepl("(?=.*instance)(?=.*percentage)(?=.*element)(?=.*character)", Text2, perl=TRUE) # this is equivalent to the solution proposed in the comments longstrd <- grepl("instance", Text2) & grepl("percentage", Text2) & grepl("element", Text2) & grepl("character", Text2) # they produce identical results identical(longperl, longstrd) Furthermore, if you have the patterns stored in a vector you can condense the expressions significantly, giving you
pat <- c("instance", "percentage", "element", "character") longperl <- grepl(paste0("(?=.*", pat, ")", collapse=""), Text2, perl=TRUE) longstrd <- rowSums(sapply(pat, grepl, Text2) - 1L) == 0L As asked for in the comments, if you want to match on exact words, i.e. not match on substrings, we can specify word boundaries using \\b. E.g:
tx <- c("cent element", "percentage element", "element cent", "element centimetre") grepl("(?=.*\\bcent\\b)(?=.*element)", tx, perl=TRUE) # TRUE FALSE TRUE FALSE grepl("element", tx) & grepl("\\bcent\\b", tx) # TRUE FALSE TRUE FALSE "instance" with "table" also seems to capture cases like "marketable". I tried adding "\\stable" to include a space before "table" but that doesn't work either. Any suggestions?\\b instead to indicate a word boundary, it should work.This is how you will get only "TRUE" if both terms do occur in an item of the vector "labelTest$Text". I think this is the exact answer to the question and much shorter than the other solutions.
grepl("instance",labelTest$Text) & grepl("percentage",labelTest$Text) Use intersect and feed it a grep for each word:
library(data.table) #used for subsetting text vector below vector_of_text[ intersect( grep(vector_of_text , pattern = "pattern1"), grep(vector_of_text , pattern = "pattern2") ) ]
greponce with the "instance" and then do the same with "percentage"? get the replies (as T or F) and combine them ?labelTest$label[ grep("instance", labelTest$Text)] <- "combination1"so one with "instance" and other with "percentage" wont work.labelTest$label[ grep("instance", labelTest$Text) & grep("percentage", labelTest$Text)] <- "combination1"is what @agerom was suggesting and should worklonger object length is not a multiple of shorter object length