Suppose I have a tibble with a grouping variable and a logical variable that indicates whether a row is a primary response for that group.
I want to do the following:
- If any row in a
groupis marked asis_primarykeep that row but none of the others in the group - If no row in
groupis marked withis_primary, keep them all - Filter the rows based on the above
Here is some example data:
library(tidyverse) data <- tibble(group=c("A","A","A","B","B","C","C","C","C"), is_primary=c(FALSE, FALSE, FALSE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE), value=c(1,2,3,4,5,6,7,8,9)) In the above example, I'd like to keep all the A rows, because there is no row with is_primary==TRUE, keep only the second B row, and keep the last two C rows.
I thought the obvious solution would be something like:
data %>% group_by(group) %>% mutate(keep_row=ifelse(any(is_primary),is_primary,TRUE)) But this results in the following, which doesn't meet the criteria above.
# A tibble: 9 x 4 # Groups: group [3] group is_primary value keep_row <chr> <lgl> <dbl> <lgl> 1 A FALSE 1 TRUE 2 A FALSE 2 TRUE 3 A FALSE 3 TRUE 4 B FALSE 4 FALSE 5 B TRUE 5 FALSE 6 C FALSE 6 FALSE 7 C FALSE 7 FALSE 8 C TRUE 8 FALSE 9 C TRUE 9 FALSE However, if I make an intermediary variable that indicated whether the group has a primary key it works.
data %>% group_by(group) %>% mutate(has_primary=ifelse(any(is_primary),TRUE,FALSE)) %>% mutate(keep_row=ifelse(has_primary,is_primary,TRUE)) This results in keep_row being correct:
# A tibble: 9 x 5 # Groups: group [3] group is_primary value has_primary keep_row <chr> <lgl> <dbl> <lgl> <lgl> 1 A FALSE 1 FALSE TRUE 2 A FALSE 2 FALSE TRUE 3 A FALSE 3 FALSE TRUE 4 B FALSE 4 TRUE FALSE 5 B TRUE 5 TRUE TRUE 6 C FALSE 6 TRUE FALSE 7 C FALSE 7 TRUE FALSE 8 C TRUE 8 TRUE TRUE 9 C TRUE 9 TRUE TRUE What is going on in ifelse that the first solution doesn't work?