Selecting random portions of a dataframe

Question

My dataset is a series of surveys. Each survey is divided up into several time periods and each time period has several observations. Each line in the dataset is a single observation. It looks something like this:

Survey Period Observation 1.1 1 A 1.1 1 A 1.1 1 B 1.1 2 A 1.1 2 B 1.2 1 A 1.2 2 B 1.2 3 C 1.2 4 D

This is a simplified version of my dataset, but it demonstrates the point (several periods for each survey, several observations for each period). What I want to do is make a dataframe consisting of all the observations from a single, randomly selected, period in each survey, so that in the resulting dataframe each survey only has a single period, but all of the associated observations. I'm completely stumped on this one and don't even know where to start.

Thanks for your help

You can start here: [stackoverflow.com/questions/25937466/… — R Yoda
– R Yoda, Commented Nov 11, 2015 at 17:48

AntoniosK · Accepted Answer · 2015-11-11 18:14:30Z

If I've understood correctly, for each survey you need to randomly select one period only and then get all corresponding observations. There might alternative ways, but I'm using a dplyr approach.

dt = read.table(text="Survey Period Observation 1.1 1 A 1.1 1 A 1.1 1 B 1.1 2 A 1.1 2 B 1.2 1 A 1.2 2 B 1.2 3 C 1.2 4 D", header=T) library(dplyr) set.seed(49) ## just to be able to replicate the process exactly dt %>% select(Survey, Period) %>% ## select relevant columns distinct() %>% ## keep unique combinations group_by(Survey) %>% ## for each survey sample_n(1) %>% ## sample only one period ungroup() %>% ## forget about the grouping inner_join(dt, by=c("Survey","Period")) ## get corresponding observations # Survey Period Observation # (dbl) (int) (fctr) # 1 1.1 1 A # 2 1.1 1 A # 3 1.1 1 B # 4 1.2 2 B

jtatria · Accepted Answer · 2015-11-11 18:39:16Z

You can achieve what you need in a straigth forward way using plain vanilla base R doing something like this:

out = d[0,] # make empty dataframe with similar structure. for( survey in levels( as.factor( d$Survey ) ) ) { # for each value of survey # randomly choose 1 from the observed values of Period for this value of Survey: period = sample( d[ d$Survey == survey, ]$Period, 1 ) # attach all rows with that survey and that period to the empty df above out = rbind( out, d[ d$Survey == survey & d$Period == period, ] ) }

Collectives™ on Stack Overflow

Selecting random portions of a dataframe

2 Answers 2

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Linked

Related