0

I am working on a dataframe of plant scientific names a sample of which is as follows:

plantlist <- data.frame(ID = c(1,2,2,2,2,2,2), SciName = c("Alkanna tuberculata", "Alkanna tuberculata", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Anchusa tinctoria", "Echium italicum"), SciName.w.author = c("Alkanna tuberculata Greuter", "Alkanna tuberculata Meikle", "Anchusa tinctoria L", "Anchusa tinctoria Woodv", "Anchusa tinctoria Pall", "Anchusa tinctoria Meikle", "Echium italicum"), Status = c("Unresolved", "Misapplied", "Accepted", "Synonym", "Unresolved", "Synonym", "Misapplied")) 

What I need to do is to group the columns by ID, and SciName and then keep the following rows:

  1. if there is only one row in the group keep it, no matter what the status is
  2. if there are more than two rows keep the accepted and synonyms
  3. if there are no accepted and synonyms keep unresolved and if no unresolved keep missapplied

I tried to accomplish this using case_when and grouping but I'm stuck in the last part

keep.plantlist <- plantlist %>% group_by(ID, SciName) %>% mutate(count = n()) %>% ungroup()%>% mutate(keep = case_when(count == 1 ~ T , count > 1 & STATUS == "Accepted" ~ T, count > 1 & STATUS == "Synonym" ~ T)) #expected keep row plantlist$keep <- c(T, F, T, T, F, T, T) 

I also tried mutating status as factor and arranging the groups by the priority I need, but I don't know if there is any function that could help if I have that order.

5
  • Please give an expected result keep.plantlist for benchmark use. Commented Jan 22, 2022 at 4:04
  • As far as I can tell, the expected result for this input is TRUE for everything. Please edit your example to include some false cases, and show the expected output. Commented Jan 22, 2022 at 4:08
  • @Gregor Thomas Ihave different combinations of rows for each group (up to 17 rows per group). There may be both accepted rows and misapplied for example or combinations of three or even four. For such groups I just want to keep the accepted and the synonyms (if any) and omit the missapplied or unresolved. Commented Jan 22, 2022 at 4:43
  • @Gregor Thomas Yes you're right. This was not a good sample. I edited the sample Commented Jan 22, 2022 at 4:50
  • @Peace Wang thank you for the suggestion. I added an expected row. Commented Jan 22, 2022 at 4:50

1 Answer 1

1

I think this will work, but need a higher quality test-set to be sure.

keep.plantlist <- plantlist %>% group_by(ID, SciName) %>% mutate(count = n()) %>% mutate(keep = case_when( count == 1 ~ T , count > 1 & STATUS == "Accepted" ~ T, count > 1 & STATUS == "Synonym" ~ T, !any(STATUS %in% c("Accepted", "Synonym")) & STATUS %in% "Unresolved" ~ TRUE, !any(STATUS %in% c("Accepted", "Synonym", "Unresolved")) & STATUS %in% "Misapplied" ~ TRUE, TRUE ~ FALSE )) 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.