2

I have a dataframe df with columns ID, Year, Value1, Value2, Value3 and 21788928 rows. I need to subset the data by Year and IDand find the max Value1 in that subset saving the rest of information of that row, I need to do that for all the combinations of Year and ID(Year goes from 1982 to 2013, ID is from 1 to 28371)

I was trying to do that in a double for loop:

year<-seq(1982, 2013) cnt=1 for (i in 1:32) { for (j in 1:28371) A<-df[df$Year==year[i]&df$ID==j,] maxVal[cnt,]<-A[A$Value1==max(A$Value1),] cnt=cnt+1 } } 

but it takes way to long. Is there a more efficient way to do that? Maybe using ddply or with.

1
  • Please post a sample of the data, and the desired result Commented Aug 1, 2015 at 15:50

2 Answers 2

5

A base R solution with aggregate:

prov <- aggregate(. ~ Year + ID, data = dat, FUN = max) 
Sign up to request clarification or add additional context in comments.

1 Comment

@Liza you are welcome. Consider to accept the answer that fits your goals and next time if you have to add clarifications please do edit your post and do not post an answer (unless of course you find by yourself the solution for your problems )
4

You can use dplyr

library(dplyr) dat %>% group_by(ID, Year) %>% summarise(mval=max(Value1)) -> result 

or plyr, keeping all the other columns (and repeating max Value1 as mval)

ddply(dat, .(ID, Year), function(x) { transform(x[which.max(x$Value1),], mval=Value1) }, .drop=F) 

Data

dat <- data.frame(ID=sample(1:10, 100, rep=T), Year=sample(1995:2000, 100, rep=T), Value1=runif(100)) 

2 Comments

Thank you! It did exactly what I wanted i just few min.
Or library(data.table);setDT(dat)[, list(mval= max(Value1)) , .(ID, Year)]

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.