With the following sample dataframe I would like to draw a stratified random sample (e.g., 40%) of the ID's "ID" from each level of the factor "Cohort":
data<-structure(list(Cohort = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), ID = structure(1:20, .Label = c("a1 ", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "b10", "b11", "b12", "b13", "b14", "b15", "b16", "b17", "b18", "b19", "b20" ), class = "factor")), .Names = c("Cohort", "ID"), class = "data.frame", row.names = c(NA, -20L)) I only know how to draw a random number of rows using the following:
library(dplyr) data %>% group_by(Cohort) %>% sample_n(size = 10) But my actual data are longitudinal so I have multiple cases of the same ID within each cohort and several cohorts of different sizes, thus the need to select a proportion of unique ID's. Any assistance would be appreciated.