I'm relatively new at R and I'm trying to build a function which will loop through columns in an imported table and produce an output which consists of the means and 95% confidence intervals. Ideally it should be possible to bootstrap columns with different sample sizes, but first I would like to get the iteration working. I have something that sort-of works, but I can't get it all the way there. This is what the code looks like, with the sample data and output included:
#cdata<-read.csv(file.choose(),header=T)#read data from selected file, works, commented out because data is provided below #cdata #check imported data #Sample Data # WALL NRPK CISC WHSC LKWH YLPR #1 21 8 1 2 2 5 #2 57 9 3 1 0 1 #3 45 6 9 1 2 0 #4 17 10 2 0 3 0 #5 33 2 4 0 0 0 #6 41 4 13 1 0 0 #7 21 4 7 1 0 0 #8 32 7 1 7 6 0 #9 9 7 0 5 1 0 #10 9 4 1 0 0 0 x<-cdata[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant species i<-nrow(x) #count number of rows for bootstrapping g<-ncol(x) #count number of columns for iteration #build bootstrapping function, this works for the first column but doesn't iterate bootfun <- function(bootdata, reps) { boot <- function(bootdata){ s1=sample(bootdata, size=i, replace=TRUE) ms1=mean(s1) return(ms1) } # a single bootstrap bootrep <- replicate(n=reps, boot(bootdata)) return(bootrep) } #replicates bootstrap of "bootdata" "reps" number of times and outputs vector of results cvr1 <- bootfun(x$YLPR,50000) #have unsuccessfully tried iterating the location various ways (i.e. x[i]) cvrquantile<-quantile(cvr1,c(0.025,0.975)) cvrmean<-mean(cvr1) vec<-c(cvrmean,cvrquantile) #puts results into a suitable form for output vecr<-sapply(vec,round,1) #rounds results vecr 2.5% 97.5% 28.5 19.4 38.1 #apply(x[1:g],2,bootfun) ##doesn't work in this case #desired output: #Species Mean LowerCI UpperCI #WALL 28.5 19.4 38.1 #NRPK 6.1 4.6 7.6 #YLPR 0.6 0.0 1.6 I've also tried this using the boot package, and it works beautifully to iterate through the means but I can't get it to do the same with the confidence intervals. The "ordinary" code above also has the advantage that you can easily retrieve the bootstrapping results, which might be used for other calculations. For the sake of completeness here is the boot code:
#Bootstrapping using boot package library(boot) #data<-read.csv(file.choose(),header=TRUE) #read data from selected file #x<-data[,c("WALL","NRPK","LKWH","YLPR")] #only select relevant columns #x #check data #Sample Data # WALL NRPK LKWH YLPR #1 21 8 2 5 #2 57 9 0 1 #3 45 6 2 0 #4 17 10 3 0 #5 33 2 0 0 #6 41 4 0 0 #7 21 4 0 0 #8 32 7 6 0 #9 9 7 1 0 #10 9 4 0 0 i<-nrow(x) #count number of rows for resampling g<-ncol(x) #count number of columns to step through with bootstrapping boot.mean<-function(x,i){boot.mean<-mean(x[i])} #bootstrapping function to get the mean z<-boot(x, boot.mean,R=50000) #bootstrapping function, uses mean and number of reps boot.ci(z,type="perc") #derive 95% confidence intervals apply(x[1:g],2, boot.mean) #bootstrap all columns #output: #WALL NRPK LKWH YLPR #28.5 6.1 1.4 0.6 I've gone through all of the resources I can find and can't seem to get things working. What I would like for output would be the bootstrapped means with the associated confidence intervals for each column. Thanks!
boot::bootsince it provides integration with theparallelpackage