159

I have a vector like: a = c(1:10) and I need to remove multiple values, like: 2, 3, 5

How to delete those numbers (they are NOT the positions in the vector) in the vector?

at the moment i loop the vector and do something like:

a[!a=NUMBER_TO_REMOVE] 

But I think there is a function that does it automatically.

9 Answers 9

230

The %in% operator tells you which elements are among the numers to remove:

> a <- sample (1 : 10) > remove <- c (2, 3, 5) > a [1] 10 5 2 7 1 6 3 4 8 9 > a %in% remove [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE > a [! a %in% remove] [1] 10 7 1 6 4 8 9 

Note that this will silently remove incomparables (stuff like NA or Inf) as well (while it will keep duplicate values in a as long as they are not listed in remove).

  • If a can contain incomparables, but remove will not, we can use match, telling it to return 0 for non-matches and incomparables (%in% is a conventient shortcut for match):

    > a <- c (a, NA, Inf) > a [1] 10 5 2 7 1 6 3 4 8 9 NA Inf > match (a, remove, nomatch = 0L, incomparables = 0L) [1] 0 3 1 0 0 0 2 0 0 0 0 0 > a [match (a, remove, nomatch = 0L, incomparables = 0L) == 0L] [1] 10 7 1 6 4 8 9 NA Inf 

    incomparables = 0 is not needed as incomparables will anyways not match, but I'd include it for the sake of readability.
    This is, btw., what setdiff does internally (but without the unique to throw away duplicates in a which are not in remove).

  • If remove contains incomparables, you'll have to check for them individually, e.g.

    if (any (is.na (remove))) a <- a [! is.na (a)] 

    (This does not distinguish NA from NaN but the R manual anyways warns that one should not rely on having a difference between them)

    For Inf/ -Inf you'll have to check both sign and is.finite

Sign up to request clarification or add additional context in comments.

2 Comments

setdiff is better, as it does everything in one operation, and references the amended vector only once.
@Olexa: set difference is not always the same as removing all occurences of a given set of numbers from a vector: it will remove duplicates in a that are not in remove as well. If that's not a problem, you can also use setdiff. setdiff, btw, uses match for which %in% is a shortcut.
119

You can use setdiff.

Given

a <- sample(1:10) remove <- c(2, 3, 5) 

Then

> a [1] 10 8 9 1 3 4 6 7 2 5 > setdiff(a, remove) [1] 10 8 9 1 4 6 7 

3 Comments

very useful when a is the result of another function so you can do things in one line instead of 3 and a temp variable
This will produce different results than the %in% solution if the input vector contains duplicates (in which case setdiff will only return the unique set, i.e. without duplicates)
@docendodiscimus: fsetdiff of data.table package has an all flag (default F) that allows to keep duplicates in the input vector.
11

You can do it as follows:

> x<-c(2, 4, 6, 9, 10) # the list > y<-c(4, 9, 10) # values to be removed > idx = which(x %in% y ) # Positions of the values of y in x > idx [1] 2 4 5 > x = x[-idx] # Remove those values using their position and "-" operator > x [1] 2 6 

Shortly

> x = x[ - which(x %in% y)] 

4 Comments

what you're calling a list in your example is a vector, right?
Yes I mean the vector. Thanks for the comment.
There is no need in which here. It's basically the same as @cbeleites answer.
yes it is similar, but different in a few points of view. which returns indexes of TRUE values. So minus sign can be used to say "the indexes other than these indexes". Also which is more readable since it is closer to the natural language.
11

instead of

x <- x[! x %in% c(2,3,5)] 

using the packages purrr and magrittr, you can do:

your_vector %<>% discard(~ .x %in% c(2,3,5)) 

this allows for subsetting using the vector name only once. And you can use it in pipes :)

2 Comments

can you please explain your last statement about variables name length? Why you don't like that? Why is better than the other way? Or, remove that paragraph since is not related to the main issue/question.
Probably worth noting that you might need to write purrr::discard as many users will have the scales package loaded too which also contains a discard function as I painfully found out.
4

First we can define a new operator,

"%ni%" = Negate( "%in%" ) 

Then, its like x not in remove

x <- 1:10 remove <- c(2,3,5) x <- x[ x %ni% remove ] 

or why to go for remove, go directly

x <- x[ x %ni% c(2,3,5)] 

1 Comment

The question specifically says that 2, 3 and 5 are not positions in the vector.
3

There is also subset which might be useful sometimes:

a <- sample(1:10) bad <- c(2, 3, 5) > subset(a, !(a %in% bad)) [1] 9 7 10 6 8 1 4 

Comments

2

UPDATE:

All of the above answers won't work for the repeated values, @BenBolker's answer using duplicated() predicate solves this:

full_vector[!full_vector %in% searched_vector | duplicated(full_vector)] 

Original Answer: here I write a little function for this:

exclude_val<-function(full_vector,searched_vector){ found=c() for(i in full_vector){ if(any(is.element(searched_vector,i))){ searched_vector[(which(searched_vector==i))[1]]=NA } else{ found=c(found,i) } } return(found) } 

so, let's say full_vector=c(1,2,3,4,1) and searched_vector=c(1,2,3).

exclude_val(full_vector,searched_vector) will return (4,1), however above answers will return just (4).

4 Comments

what about full_vector[!full_vector %in% searched_vector | duplicated(full_vector)] ?
@BenBolker ah I didn't know that "duplicated" predicate :(( now what, shall I delete my answer or change it to show only yours instead?
@BenBolker, your solution is wrong; just try: full_vector = c(1,1,1,2,3); searched_vector = c(1,1,3); - that produces 1, 1, 2 instead of the correct answer 1, 2.
Just to add a possible, correct solution for repeated values: removeif <- function(from, where) { for (i in where) if (i %in% from) {from = from[-match(i, from)]}; from}
1
q <- c(1,1,2,2,3,3,3,4,4,5,5,7,7) rm <- q[11] remove(rm) q q[13] = NaN q q %in% 7 

This sets the 13 in a vector to not a number(NAN) it shows false remove(q[c(11,12,13)]) if you try this you will see that remove function don't work on vector number. you remove entire vector but maybe not a single element.

Comments

1

Try this function

seq.int.exclude <- function(excluded, ...) { x <- seq.int(...) return(x[!(x %in% excluded)]) } 

Call examples:

seq.int.exclude(from = 10L, to = 20L, excluded = c(12L, 30L, 19L)) seq.int.exclude(from = 10L, to = 20L, excluded = 15L) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.