I am randomly sampling participants from an original data frame, then I would like to create new data frames, excluding one sample and keeping the rest (just note the dataframe is much larger with more variables and more observations for each id).
Sample df:
id var1 var2 1 10 15 1 10 15 2 11 4 2 11 4 3 12 4 3 12 4 4 9 10 4 9 10 #randomly sample two sets of id id <- as.numeric(as.character(df$id)) fold1 <- as.data.frame(sample(id, 2, replace=TRUE)) colnames(fold1) <- "id" fold2 <- as.data.frame(sample(id, 2, replace=TRUE)) colnames(fold2) <- "id" Desired output
df.new1:
id var1 var2 2 11 4 2 11 4 3 12 4 3 12 4 df.new2:
id var1 var2 1 10 15 1 10 15 4 9 10 4 9 10 I tried something along these lines, but there seems to be some issue with my syntax I can't quite figure out. If there's a dplyr implementation I would be really happy to see it.
list = c(fold1, fold2) for(i in length(list)) { df.new <- as.data.frame(df[!(df$id %in% list[i]$id), ]) assign(paste("df.new", i, sep="."), df.new) } **Edit: I slightly modified the example to reflect the fact that each draw should sample a proportion of the total number of id's and in total the number of id's sampled should equal the total number of id's in the df. So if there are 4 id's, each draw should contain 2 id's.
dplyrhas the methodssample_n(sample n rows) andsample_frac(sample a proportion of rows). Do they help?group_by(id)withsample_n, but it didn't seem like it was sampling based on a random draw of id. But maybe there's a way to specify how it draws?idat a time?