0

This is a follow up question to this one. In the original question the OP wanted to perform bootstrap on two columns x1 and x2 that are fixed:

set.seed(1000) data <- as.data.table(list(x1 = runif(200), x2 = runif(200), group = runif(200)>0.5)) stat <- function(x, i) {x[i, c(m1 = mean(x1), m2 = mean(x2))]} data[, list(list(boot(.SD, stat, R = 10))), by = group]$V1 

However, I think this problem can be nicely extended to handle any number of columns by treating them as groups. For instance, lets use the iris dataset. Say I want to calculate bootstrap mean for all four dimensions for each species. I can use melt to flip the data and then use the Species, variable combination to get the mean in one go - I think this approach will scale well.

data(iris) iris = data.table(iris) iris[,mean(Sepal.Length),by=Species] iris[,ID:=.N,] iris_deep = melt(iris ,id.vars = c("ID","Species") ,measure.vars = c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")) #define a mean bootstrap function stat <- function(x, i) {x[i, m=mean(value),]} iris_deep[, list(list(boot(.SD, stat, R = 100))), by = list(Species,variable)]$V1 

Here is my attempt at doing this. However the bootstrapping part does not seem to be working. As R throws the following error:

Error in mean(value) : object 'value' not found 

Can someone please take a crack at this?

2 Answers 2

1

I tried this (with added braces enclosing m=mean(value)) and it appears to work:

stat <- function(x, i) {x[i, (m=mean(value))]} 
Sign up to request clarification or add additional context in comments.

Comments

0

We can utilize each bootstrap fully, and calculate the mean for each variable within each group, instead of rerunning the bootstrap for each variable.

So if we do something like this, it calculates the mean for each variable:

iris = data.table(iris) iris[sample(nrow(iris),replace=TRUE),lapply(.SD,mean,na.rm=TRUE),by=Species] 

Because boot requires a vector / matrix output, we need to modify the output above, and provide names for the vector:

d = function(dat,ind){ k = dat[ind,lapply(.SD,mean,na.rm=TRUE),by=Species] k_vec = unlist(k[,-1]) names(k_vec) = paste(rep(colnames(k)[-1],each=nrow(k)),rep(k$Species,(ncol(k)-1)),sep="_") k_vec } d(iris,sample(nrow(iris),replace=TRUE)) Sepal.Length_versicolor Sepal.Length_virginica Sepal.Length_setosa 5.8784314 6.4851852 4.9688889 Sepal.Width_versicolor Sepal.Width_virginica Sepal.Width_setosa 2.7392157 2.9814815 3.3977778 Petal.Length_versicolor Petal.Length_virginica Petal.Length_setosa 4.1980392 5.5037037 1.4644444 Petal.Width_versicolor Petal.Width_virginica Petal.Width_setosa 1.2960784 2.0944444 0.2333333 

And use boot with strata = iris$Species to ensure the Species are sampled evenly:

bo_strata = boot(iris,d,R=1000,strata=iris$Species) 

We can compare the distributions of this approach compared to that in the question:

stat <- function(x, i) {x[i, (m=mean(value))]} bo_melt = iris_deep[, list(list(boot(.SD, stat, R = 1000))), by = list(Species,variable)]$V1 par(mfrow=c(4,3)) par(mar=c(3,3,3,3)) for(i in 1:ncol(bo_strata$t)){ plot(density(bo_strata$t[,i]),main=names(bo_strata$t0)[i],col="#43658b") lines(density(bo_melt[[i]]$t),col="#ffa372") legend("topright",fill=c("#43658b","#ffa372"),c("strata","other")) } 

enter image description here

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.