Yes, you can approximate $\mathbb{P}\left(\bar{X}_n \leq x\right)$ by $\mathbb{P}\left(\bar{X}_n^* \leq x\right)$ but it is not optimal. This is a form of the percentile bootstrap. However, the percentile bootstrap does not perform well if you are seeking to make inferences about the population mean unless you have a large sample size. (It does perform well with many other inference problems including when the sample size size is small.) I take this conclusion from Wilcox's Modern Statistics for the Social and Behavioral Sciences, CRC Press, 2012. A theoretical proof is beyond me I'm afraid.
A variant on the centering approach goes the next step and scales your centered bootstrap statistic with the re-sample standard deviation and sample size, calculating the same way as a t statistic. The quantiles from the distribution of these t statistics can be used to construct a confidence interval or perform a hypothesis test. This is the bootstrap-t method and it gives superior results when making inferences about the mean.
Let $s^*$ be the re-sample standard deviation based on a bootstrap re-sample, using n-1 as denominator; and s be the standard deviation of the original sample. Let
$T^*=\frac{\bar{X}_n^*-\bar{X}}{s^*/\sqrt{n}}$
The 97.5th and 2.5th percentiles of of the simulated distribution of $T^*$ can make a confidence interval for $\mu$ by:
$\bar{X}-T^*_{0.975} \frac{s}{\sqrt{n}}, \bar{X}-T^*_{0.025} \frac{s}{\sqrt{n}}$
Consider the simulation results below, showing that with a badly skewed mixed distribution the confidence intervals from this method contain the true value more frequently than either the percentile bootstrap method or a traditional inverstion of a t statistic with no bootstrapping.
compare.boots <- function(samp, reps = 599){ # "samp" is the actual original observed sample # "s" is a re-sample for bootstrap purposes n <- length(samp) boot.t <- numeric(reps) boot.p <- numeric(reps) for(i in 1:reps){ s <- sample(samp, replace=TRUE) boot.t[i] <- (mean(s)-mean(samp)) / (sd(s)/sqrt(n)) boot.p[i] <- mean(s) } conf.t <- mean(samp)-quantile(boot.t, probs=c(0.975,0.025))*sd(samp)/sqrt(n) conf.p <- quantile(boot.p, probs=c(0.025, 0.975)) return(rbind(conf.t, conf.p, "Trad T test"=t.test(samp)$conf.int)) } # Tests below will be for case where sample size is 15 n <- 15 # Create a population that is normally distributed set.seed(123) pop <- rnorm(1000,10,1) my.sample <- sample(pop,n) # All three methods have similar results when normally distributed compare.boots(my.sample)
This gives the following (conf.t is the bootstrap t method; conf.p is the percentile bootstrap method).
97.5% 2.5% conf.t 9.648824 10.98006 conf.p 9.808311 10.95964 Trad T test 9.681865 11.01644
With a single example from a skewed distribution:
# create a population that is a mixture of two normal and one gamma distribution set.seed(123) pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7)) my.sample <- sample(pop,n) mean(pop) compare.boots(my.sample)
This gives the following. Note that "conf.t" - the bootstrap t version - gives a wider confidence interval than the other two. Basically, it is better at responding to the unusual distribution of the population.
> mean(pop) [1] 13.02341 > compare.boots(my.sample) 97.5% 2.5% conf.t 10.432285 29.54331 conf.p 9.813542 19.67761 Trad T test 8.312949 20.24093
Finally here is a thousand simulations to see which version gives confidence intervals that are most often correct:
# simulation study set.seed(123) sims <- 1000 results <- matrix(FALSE, sims,3) colnames(results) <- c("Bootstrap T", "Bootstrap percentile", "Trad T test") for(i in 1:sims){ pop <- c(rnorm(1000,10,2),rgamma(3000,3,1)*4, rnorm(200,45,7)) my.sample <- sample(pop,n) mu <- mean(pop) x <- compare.boots(my.sample) for(j in 1:3){ results[i,j] <- x[j,1] < mu & x[j,2] > mu } } apply(results,2,sum)
This gives the results below - the numbers are the times out of 1,000 that the confidence interval contains the true value of a simulated population. Notice that the true success rate of every version is considerably less than 95%.
Bootstrap T Bootstrap percentile Trad T test 901 854 890