2

I have a data frame with a variable containing elements to drop if they match to an element in another variable - see a small example below:

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3), animal = rep(c("dog", "cat"), 3), value = seq(1, 12, 2), drop = c("no", "no", "dog", "dog", "cat", "cat")) pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 3 2 dog 5 dog 4 2 cat 7 dog 5 3 dog 9 cat 6 3 cat 11 cat 

I'm trying to want to filter the data frame according to whether the value of animal matches the value of drop. I want something like filter(df, animal != drop) to remove rows where only the value of animal matches the value of drop:

 pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 4 2 cat 7 dog 5 3 dog 9 cat 

I also tried writing a simple loop to test whether animal matches drop for each row and remove the row if true, but I couldn't get it working. (I'm not very confident with loops and would prefer not to use one if possible as my data frame is very large but I was getting desperate!)

for(i in nrow(df)){ if(df$animal[i] == df$drop[i]){ df <- df[-i,] return(df) } } 

Is there a way of doing this using dplyr?

2 Answers 2

4

The use of filter(df, animal != drop) is correct. However, as you haven't specified stringsAsFactors = F in your data.frame() call, all strings are converted to factors, raising the error of different level sets. Thus adding stringsAsFactors = F, should solve this

df <- data.frame(pair = c(1, 1, 2, 2, 3, 3), animal = rep(c("dog", "cat"), 3), value = seq(1, 12, 2), drop = c("no", "no", "dog", "dog", "cat", "cat"), stringsAsFactors = F) df %>% filter(animal != drop) pair animal value drop 1 1 dog 1 no 2 1 cat 3 no 3 2 cat 7 dog 4 3 dog 9 cat 

To avoid issues with this undesired string to factor behaviour I highly recommend the use of tibble

In case that one does not have the opportunity to change how the data is created I here include @akrun's solution:

library(dplyr) df %>% mutate_at(vars(animal, drop), as.character) %>% filter(animal != drop) # pair animal value drop #1 1 dog 1 no #2 1 cat 3 no #3 2 cat 7 dog #4 3 dog 9 cat 
Sign up to request clarification or add additional context in comments.

3 Comments

This is a good solution in this example, but consider cases where you don't have control over how the data frame is being created. In my daily work, I rarely create data frames like this—my work would be easier if I did! Maybe you can add in how you would deal with the fact that this is currently a factor
That was my issue, thank you - my original data frame is a tibble but I hadn't considered that one of my variables was factor class. filter(animal != as.character(drop) worked perfectly without having to remove the factors.
Added @akrun`s solution
3

An option would be to convert to character class with mutate_at and then use filter on non-matching elements

library(dplyr) df %>% mutate_at(vars(animal, drop), as.character) %>% filter(animal != drop) # pair animal value drop #1 1 dog 1 no #2 1 cat 3 no #3 2 cat 7 dog #4 3 dog 9 cat 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.