Bootstrapping multiple columns with R

Question

I'm relatively new at R and I'm trying to build a function which will loop through columns in an imported table and produce an output which consists of the means and 95% confidence intervals. Ideally it should be possible to bootstrap columns with different sample sizes, but first I would like to get the iteration working. I have something that sort-of works, but I can't get it all the way there. This is what the code looks like, with the sample data and output included:

#cdata<-read.csv(file.choose(),header=T)#read data from selected file, works, commented out because data is provided below #cdata #check imported data #Sample Data # WALL NRPK CISC WHSC LKWH YLPR #1 21 8 1 2 2 5 #2 57 9 3 1 0 1 #3 45 6 9 1 2 0 #4 17 10 2 0 3 0 #5 33 2 4 0 0 0 #6 41 4 13 1 0 0 #7 21 4 7 1 0 0 #8 32 7 1 7 6 0 #9 9 7 0 5 1 0 #10 9 4 1 0 0 0 x<-cdata[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant species i<-nrow(x) #count number of rows for bootstrapping g<-ncol(x) #count number of columns for iteration #build bootstrapping function, this works for the first column but doesn't iterate bootfun <- function(bootdata, reps) { boot <- function(bootdata){ s1=sample(bootdata, size=i, replace=TRUE) ms1=mean(s1) return(ms1) } # a single bootstrap bootrep <- replicate(n=reps, boot(bootdata)) return(bootrep) } #replicates bootstrap of "bootdata" "reps" number of times and outputs vector of results cvr1 <- bootfun(x$YLPR,50000) #have unsuccessfully tried iterating the location various ways (i.e. x[i]) cvrquantile<-quantile(cvr1,c(0.025,0.975)) cvrmean<-mean(cvr1) vec<-c(cvrmean,cvrquantile) #puts results into a suitable form for output vecr<-sapply(vec,round,1) #rounds results vecr 2.5% 97.5% 28.5 19.4 38.1 #apply(x[1:g],2,bootfun) ##doesn't work in this case #desired output: #Species Mean LowerCI UpperCI #WALL 28.5 19.4 38.1 #NRPK 6.1 4.6 7.6 #YLPR 0.6 0.0 1.6

I've also tried this using the boot package, and it works beautifully to iterate through the means but I can't get it to do the same with the confidence intervals. The "ordinary" code above also has the advantage that you can easily retrieve the bootstrapping results, which might be used for other calculations. For the sake of completeness here is the boot code:

#Bootstrapping using boot package library(boot) #data<-read.csv(file.choose(),header=TRUE) #read data from selected file #x<-data[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant columns #x #check data #Sample Data # WALL NRPK LKWH YLPR #1 21 8 2 5 #2 57 9 0 1 #3 45 6 2 0 #4 17 10 3 0 #5 33 2 0 0 #6 41 4 0 0 #7 21 4 0 0 #8 32 7 6 0 #9 9 7 1 0 #10 9 4 0 0 i<-nrow(x) #count number of rows for resampling g<-ncol(x) #count number of columns to step through with bootstrapping boot.mean<-function(x,i){boot.mean<-mean(x[i])} #bootstrapping function to get the mean z<-boot(x, boot.mean,R=50000) #bootstrapping function, uses mean and number of reps boot.ci(z,type="perc") #derive 95% confidence intervals apply(x[1:g],2, boot.mean) #bootstrap all columns #output: #WALL NRPK LKWH YLPR #28.5 6.1 1.4 0.6

I've gone through all of the resources I can find and can't seem to get things working. What I would like for output would be the bootstrapped means with the associated confidence intervals for each column. Thanks!

What do you mean when you say an output that consists of the means and confidence intervals? Do you mean you want the mean, 0.025, and 0.975 percentiles of the calculated statistic? — alexwhitworth
– alexwhitworth, Commented Feb 3, 2016 at 1:02
Also, for 50k replicates * n columns, you probably want to use boot::boot since it provides integration with the parallel package — alexwhitworth
– alexwhitworth, Commented Feb 3, 2016 at 1:04
When I say an output, I mean I would like to have the results displayed in a table which consists of the column name for each column, the bootstrapped means, and the associated confidence intervals. Thank you for the advice about boot::boot. — JK101
– JK101, Commented Feb 3, 2016 at 3:21
... your response didn't answer my question at all, but glad you got the answer you needed — alexwhitworth
– alexwhitworth, Commented Feb 3, 2016 at 4:35
I'm afraid I must have misunderstood your question. I provided an example of what I was looking for as the desired output in my code example, which showed the bootstrapped means with the 95% confidence intervals. Excuse me please, I have a bad cold (and I've taken cold medication) so I obviously goofed somehow. Thanks for trying to help, it is appreciated. — JK101
– JK101, Commented Feb 3, 2016 at 4:50

fishtank · Accepted Answer · 2016-02-03 00:29:53Z

Note: apply(x[1:g],2, boot.mean) #bootstrap all columns doesn't do any bootstrap. You are simply calculating the mean for each column.

For bootstrap mean and confidence interval, try this:

apply(x,2,function(y){ b<-boot(y,boot.mean,R=50000); c(mean(b$t),boot.ci(b,type="perc", conf=0.95)$percent[4:5]) })

This works perfectly. I'm still getting some odd niggling errors with my code (not yours) but what you provided produces the exact results I was looking for. Thank you so much! :)

Collectives™ on Stack Overflow

Bootstrapping multiple columns with R

1 Answer 1

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Related