How to delete multiple values from a vector?

Question

I have a vector like: a = c(1:10) and I need to remove multiple values, like: 2, 3, 5

How to delete those numbers (they are NOT the positions in the vector) in the vector?

at the moment i loop the vector and do something like:

a[!a=NUMBER_TO_REMOVE]

But I think there is a function that does it automatically.

cbeleites · Accepted Answer · 2019-01-26 10:18:19Z

The %in% operator tells you which elements are among the numers to remove:

> a <- sample (1 : 10) > remove <- c (2, 3, 5) > a [1] 10 5 2 7 1 6 3 4 8 9 > a %in% remove [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE > a [! a %in% remove] [1] 10 7 1 6 4 8 9

Note that this will silently remove incomparables (stuff like NA or Inf) as well (while it will keep duplicate values in a as long as they are not listed in remove).

If a can contain incomparables, but remove will not, we can use match, telling it to return 0 for non-matches and incomparables (%in% is a conventient shortcut for match):
```
> a <- c (a, NA, Inf) > a [1] 10 5 2 7 1 6 3 4 8 9 NA Inf > match (a, remove, nomatch = 0L, incomparables = 0L) [1] 0 3 1 0 0 0 2 0 0 0 0 0 > a [match (a, remove, nomatch = 0L, incomparables = 0L) == 0L] [1] 10 7 1 6 4 8 9 NA Inf 
```
incomparables = 0 is not needed as incomparables will anyways not match, but I'd include it for the sake of readability.
This is, btw., what setdiff does internally (but without the unique to throw away duplicates in a which are not in remove).
If remove contains incomparables, you'll have to check for them individually, e.g.
```
if (any (is.na (remove))) a <- a [! is.na (a)] 
```
(This does not distinguish NA from NaN but the R manual anyways warns that one should not rely on having a difference between them)

For Inf/ -Inf you'll have to check both sign and is.finite

setdiff is better, as it does everything in one operation, and references the amended vector only once.
@Olexa: set difference is not always the same as removing all occurences of a given set of numbers from a vector: it will remove duplicates in a that are not in remove as well. If that's not a problem, you can also use setdiff. setdiff, btw, uses match for which %in% is a shortcut.

Brian Diggs · Accepted Answer · 2012-07-26 15:39:39Z

119

You can use setdiff.

Given

a <- sample(1:10) remove <- c(2, 3, 5)

Then

> a [1] 10 8 9 1 3 4 6 7 2 5 > setdiff(a, remove) [1] 10 8 9 1 4 6 7

answered Jul 26, 2012 at 15:39

Brian Diggs

59.1k14 gold badges169 silver badges189 bronze badges

3 Comments

jf328 Over a year ago

very useful when a is the result of another function so you can do things in one line instead of 3 and a temp variable

talat Over a year ago

This will produce different results than the %in% solution if the input vector contains duplicates (in which case setdiff will only return the unique set, i.e. without duplicates)

Juergen Over a year ago

@docendodiscimus: fsetdiff of data.table package has an all flag (default F) that allows to keep duplicates in the input vector.

ibilgen · Accepted Answer · 2013-09-19 00:02:14Z

11

You can do it as follows:

> x<-c(2, 4, 6, 9, 10) # the list > y<-c(4, 9, 10) # values to be removed > idx = which(x %in% y ) # Positions of the values of y in x > idx [1] 2 4 5 > x = x[-idx] # Remove those values using their position and "-" operator > x [1] 2 6

Shortly

> x = x[ - which(x %in% y)]

answered Sep 19, 2013 at 0:02

ibilgen

4801 gold badge7 silver badges17 bronze badges

4 Comments

patrick Over a year ago

what you're calling a list in your example is a vector, right?

ibilgen Over a year ago

Yes I mean the vector. Thanks for the comment.

David Arenburg Over a year ago

There is no need in which here. It's basically the same as @cbeleites answer.

ibilgen Over a year ago

yes it is similar, but different in a few points of view. which returns indexes of TRUE values. So minus sign can be used to say "the indexes other than these indexes". Also which is more readable since it is closer to the natural language.

NelsonGon · Accepted Answer · 2019-07-07 12:24:28Z

11

instead of

x <- x[! x %in% c(2,3,5)]

using the packages purrr and magrittr, you can do:

your_vector %<>% discard(~ .x %in% c(2,3,5))

this allows for subsetting using the vector name only once. And you can use it in pipes :)

edited Jul 7, 2019 at 12:24

NelsonGon

13.3k7 gold badges32 silver badges60 bronze badges

answered Oct 29, 2016 at 12:50

krishan404

1111 silver badge3 bronze badges

2 Comments

rodrigoap Over a year ago

can you please explain your last statement about variables name length? Why you don't like that? Why is better than the other way? Or, remove that paragraph since is not related to the main issue/question.

user63230 Over a year ago

Probably worth noting that you might need to write purrr::discard as many users will have the scales package loaded too which also contains a discard function as I painfully found out.

TheMI · Accepted Answer · 2015-07-08 14:07:53Z

First we can define a new operator,

"%ni%" = Negate( "%in%" )

Then, its like x not in remove

x <- 1:10 remove <- c(2,3,5) x <- x[ x %ni% remove ]

or why to go for remove, go directly

x <- x[ x %ni% c(2,3,5)]

The question specifically says that 2, 3 and 5 are not positions in the vector.

Karolis Koncevičius · Accepted Answer · 2018-10-09 17:45:04Z

There is also subset which might be useful sometimes:

a <- sample(1:10) bad <- c(2, 3, 5) > subset(a, !(a %in% bad)) [1] 9 7 10 6 8 1 4

Özgür · Accepted Answer · 2015-04-03 13:33:17Z

UPDATE:

All of the above answers won't work for the repeated values, @BenBolker's answer using duplicated() predicate solves this:

full_vector[!full_vector %in% searched_vector | duplicated(full_vector)]

Original Answer: here I write a little function for this:

exclude_val<-function(full_vector,searched_vector){ found=c() for(i in full_vector){ if(any(is.element(searched_vector,i))){ searched_vector[(which(searched_vector==i))[1]]=NA } else{ found=c(found,i) } } return(found) }

so, let's say full_vector=c(1,2,3,4,1) and searched_vector=c(1,2,3).

exclude_val(full_vector,searched_vector) will return (4,1), however above answers will return just (4).

what about full_vector[!full_vector %in% searched_vector | duplicated(full_vector)] ?
@BenBolker ah I didn't know that "duplicated" predicate :(( now what, shall I delete my answer or change it to show only yours instead?
@BenBolker, your solution is wrong; just try: full_vector = c(1,1,1,2,3); searched_vector = c(1,1,3); - that produces 1, 1, 2 instead of the correct answer 1, 2.
Just to add a possible, correct solution for repeated values: removeif <- function(from, where) { for (i in where) if (i %in% from) {from = from[-match(i, from)]}; from}

Alex Marculescu · Accepted Answer · 2016-11-12 07:37:16Z

q <- c(1,1,2,2,3,3,3,4,4,5,5,7,7) rm <- q[11] remove(rm) q q[13] = NaN q q %in% 7

This sets the 13 in a vector to not a number(NAN) it shows false remove(q[c(11,12,13)]) if you try this you will see that remove function don't work on vector number. you remove entire vector but maybe not a single element.

Armando Contestabile · Accepted Answer · 2022-09-16 06:44:05Z

Try this function

seq.int.exclude <- function(excluded, ...) { x <- seq.int(...) return(x[!(x %in% excluded)]) }

Call examples:

seq.int.exclude(from = 10L, to = 20L, excluded = c(12L, 30L, 19L)) seq.int.exclude(from = 10L, to = 20L, excluded = 15L)

Collectives™ on Stack Overflow

How to delete multiple values from a vector?

9 Answers 9

2 Comments

3 Comments

4 Comments

2 Comments

1 Comment

Comments

4 Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

2 Comments

3 Comments

4 Comments

2 Comments

1 Comment

Comments

4 Comments

Comments

Comments

Linked

Related