filter rows based on group with multiple conditions in R

Question

I am working on a dataframe of plant scientific names a sample of which is as follows:

plantlist <- data.frame(ID = c(1,2,2,2,2,2,2), SciName = c("Alkanna tuberculata", "Alkanna tuberculata", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Echium italicum"), SciName.w.author = c("Alkanna tuberculata Greuter", "Alkanna tuberculata Meikle", "Anchusa tinctoria L", "Anchusa tinctoria Woodv", "Anchusa tinctoria Pall", "Anchusa tinctoria Meikle", "Echium italicum"), Status = c("Unresolved", "Misapplied", "Accepted", "Synonym", "Unresolved", "Synonym", "Misapplied"))

What I need to do is to group the columns by ID, and SciName and then keep the following rows:

if there is only one row in the group keep it, no matter what the status is
if there are more than two rows keep the accepted and synonyms
if there are no accepted and synonyms keep unresolved and if no unresolved keep missapplied

I tried to accomplish this using case_when and grouping but I'm stuck in the last part

keep.plantlist <- plantlist %>% group_by(ID, SciName) %>% mutate(count = n()) %>% ungroup()%>% mutate(keep = case_when(count == 1 ~ T , count > 1 & STATUS == "Accepted" ~ T, count > 1 & STATUS == "Synonym" ~ T)) #expected keep row plantlist$keep <- c(T, F, T, T, F, T, T)

I also tried mutating status as factor and arranging the groups by the priority I need, but I don't know if there is any function that could help if I have that order.

Please give an expected result keep.plantlist for benchmark use. — Peace Wang
– Peace Wang, Commented Jan 22, 2022 at 4:04
As far as I can tell, the expected result for this input is TRUE for everything. Please edit your example to include some false cases, and show the expected output. — Gregor Thomas
– Gregor Thomas, Commented Jan 22, 2022 at 4:08
@Gregor Thomas Ihave different combinations of rows for each group (up to 17 rows per group). There may be both accepted rows and misapplied for example or combinations of three or even four. For such groups I just want to keep the accepted and the synonyms (if any) and omit the missapplied or unresolved. — ayeh
– ayeh, Commented Jan 22, 2022 at 4:43
@Gregor Thomas Yes you're right. This was not a good sample. I edited the sample — ayeh
– ayeh, Commented Jan 22, 2022 at 4:50
@Peace Wang thank you for the suggestion. I added an expected row. — ayeh
– ayeh, Commented Jan 22, 2022 at 4:50

Gregor Thomas · Accepted Answer · 2022-01-22 04:09:18Z

I think this will work, but need a higher quality test-set to be sure.

keep.plantlist <- plantlist %>% group_by(ID, SciName) %>% mutate(count = n()) %>% mutate(keep = case_when( count == 1 ~ T , count > 1 & STATUS == "Accepted" ~ T, count > 1 & STATUS == "Synonym" ~ T, !any(STATUS %in% c("Accepted", "Synonym")) & STATUS %in% "Unresolved" ~ TRUE, !any(STATUS %in% c("Accepted", "Synonym", "Unresolved")) & STATUS %in% "Misapplied" ~ TRUE, TRUE ~ FALSE ))

Collectives™ on Stack Overflow

filter rows based on group with multiple conditions in R

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related