ddply with which.max function

Question

You may be able to help me: For each ID, I wish to extract the largest "a" value that has the largest "b" value. In other words, I wish to scan through the "b" values, identify the highest (here b=40). If several "a" have the same highest "b" value (here a=20 and a=30), then I wish to select the highest "a" value (here a=30).

Here is what I have done so far:

df<- data.frame(ID=c('1','1','1','1','1','1'), a=c('10','20','30','10','2','30'), b=c('10','20','30','10','40', "40")) library(plyr) opt <- ddply(df,.(ID),summarise, a=a[which.max(b)]) opt ID a 1 2

but, I don't get:

ID a 1 30

I'd greatly appreciate your suggestions. Note that contrary to this sample dataset, the actual dataset I work on is pretty large. Thank you very much!

which.max gives you the index. For the value just use max. — James
– James, Commented Sep 24, 2018 at 9:32

Ronak Shah · Accepted Answer · 2018-09-24 09:15:38Z

We can use dplyr, arrange b and a in descending order by group (ID) and then get the first row of each group.

library(dplyr) df %>% group_by(ID) %>% arrange(desc(b), desc(a)) %>% slice(1) # ID a b # <fct> <fct> <fct> #1 1 30 40

As shown in expected output , if we need only ID and a column we can just select them

df %>% group_by(ID) %>% arrange(desc(b), desc(a)) %>% slice(1) %>% select(ID, a)

We can also arrange them in ascending order and then select last row using n()

library(dplyr) df %>% group_by(ID) %>% arrange(b, a) %>% slice(n()) %>% select(ID, a)

Thank you very much for your time and support, it's working perfectly

Collectives™ on Stack Overflow

ddply with which.max function

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related