2

I have 18 columns and 100 rows, where columns stand for 18 students and rows stand for their grades in 100 exams. Here is what I want: for each student, I want to randomly sample/select only one grade from all 100 grades. In other words, I want a sample with 18 columns and just 1 row. I have tried apply, sample functions, but all of these just don't work, and I don't know why.

bs = data.frame(matrix(nrow=1,ncol=18)) for (i in colnames(high)){ bs[,i]=sample(high[,i],1,replace=TRUE) } as.data.frame(lapply(high[,i],sample,18,replace=TRUE)) 

4 Answers 4

1

Try this

apply(data, 2, sample, size = 1) 

Use @StupidWolf's data for test:

set.seed(101) apply(high, 2, sample, size = 1) # student1 student2 student3 student4 student5 student6 student7 student8 student9 student10 student11 student12 student13 student14 student15 student16 student17 student18 # 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604 
Sign up to request clarification or add additional context in comments.

Comments

1

Lets say your data is like this:

set.seed(100) high = matrix(runif(100*18),ncol=18) colnames(high) = paste0("student",1:18) rownames(high) = paste0("exam",1:100) head(high) student1 student2 student3 student4 student5 student6 student7 exam1 0.30776611 0.32741508 0.3695961 0.8495923 0.5112374 0.2202326 0.03176634 exam2 0.25767250 0.38947869 0.9563228 0.6532260 0.2777107 0.7431595 0.57970549 exam3 0.55232243 0.04105275 0.9135767 0.9508858 0.3606569 0.3059573 0.15420484 exam4 0.05638315 0.36139663 0.8233363 0.6172230 0.4375279 0.4022088 0.12527050 

What you want to do, is sample 1 to 100, 18 times with replacement (to be similar to bootstrap, thanks to @H1 for pointing this out):

set.seed(101) take=sample(1:100,18,replace=TRUE) take [1] 73 57 46 95 81 58 95 61 60 59 99 3 32 9 96 99 99 98 

As you can see from above, 99 is taken quite a few times with replace=TRUE. We will take the 73 entry of column1, 56 entry of column2 and so on. This can be done with:

high[cbind(take,1:18)] [1] 0.57256477 0.84338121 0.71225050 0.56432392 0.23865929 0.23563641 [7] 0.51903694 0.36692427 0.51577410 0.45780908 0.19434773 0.70247028 [13] 0.60383059 0.25451088 0.78583242 0.86241707 0.05360842 0.61892604 

1 Comment

Although not simplest, but this does grant me insights from another perspective of solving this problem. Thank you!
1

You can use the sample() to randomly select a column.

I have created a small sample of the data here. It will be helpful if you provide the sample data for the best comprehension of the problem.

# sample data df <- data.frame( student1 = c(50, 45, 86, 30), student2 = c(56, 78, 63, 58), student3 = c(88, 60, 75, 93), student4 = c(87, 33, 49, 11), student5 = c(85, 96, 55, 64) ) 

Then you loop through each exam record and randomly chose a student's grade and store it in a vector. As a final step, since you want a data frame, you can convert the vector to a data frame.

# column names students <- colnames(df) # empty vector vals <- c() for(s in students) { grade <- sample(df[[s]], 1) vals <- c(vals, grade) } finalDF <- as.data.frame(t(vals)) names(finalDF) <- students finalDF 

The output for 2 iterations I ran are -

 student1 student2 student3 student4 student5 1 45 78 93 87 64 student1 student2 student3 student4 student5 1 45 63 93 87 96 

The other answers are really smart, but nonetheless, I hope this helps!

1 Comment

It's helpful in collecting large iterations! Thank you!
0

You can rearrange your dataframe:

df <- df[sample(1:nrow(df)),] 

then you take the first observation of each group in your dataframe:

df.pick <- df[!duplicated(df$group) , ] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.