I want to extract n rows randomly from a data frame in function of one column. So with this example :
# Reproducible example df <- as.data.frame(matrix(0,2e+6,2)) df$V1 <- runif(nrow(df),0,1) df$V2 <- sample(c(1:10),nrow(df), replace=TRUE) df$V3 <- sample(c("A","B","C"),nrow(df), replace=TRUE) I want to extract, for example, n=10rows for each value of V2.
# Example of what I need with one value of V2 df1 <- df[which(df$V2==1),] str(df1) df1[sample(1:nrow(df1),10),] I do not want to do any for-loopso I tried this line with tapply:
df_objective <- tapply(df$V1, df$V2, function(x) df[sample(1:nrow(df),10),"V2"]) which is close to what I want but I lost the third column of the data frame.
I tried this to have complete subsets :
df_objective <- by(cbind(df$V1,df$V3), df$V2, function(x) df[sample(1:nrow(df),10),"V2"]) but it does not help.
How can I keep all the columns in the subsets ?