I am working on a dataframe of plant scientific names a sample of which is as follows:
plantlist <- data.frame(ID = c(1,2,2,2,2,2,2), SciName = c("Alkanna tuberculata", "Alkanna tuberculata", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Echium italicum"), SciName.w.author = c("Alkanna tuberculata Greuter", "Alkanna tuberculata Meikle", "Anchusa tinctoria L", "Anchusa tinctoria Woodv", "Anchusa tinctoria Pall", "Anchusa tinctoria Meikle", "Echium italicum"), Status = c("Unresolved", "Misapplied", "Accepted", "Synonym", "Unresolved", "Synonym", "Misapplied")) What I need to do is to group the columns by ID, and SciName and then keep the following rows:
- if there is only one row in the group keep it, no matter what the status is
- if there are more than two rows keep the accepted and synonyms
- if there are no accepted and synonyms keep unresolved and if no unresolved keep missapplied
I tried to accomplish this using case_when and grouping but I'm stuck in the last part
keep.plantlist <- plantlist %>% group_by(ID, SciName) %>% mutate(count = n()) %>% ungroup()%>% mutate(keep = case_when(count == 1 ~ T , count > 1 & STATUS == "Accepted" ~ T, count > 1 & STATUS == "Synonym" ~ T)) #expected keep row plantlist$keep <- c(T, F, T, T, F, T, T) I also tried mutating status as factor and arranging the groups by the priority I need, but I don't know if there is any function that could help if I have that order.
keep.plantlistfor benchmark use.