I have a response variable, y.hat, that is an estimate of animal abundance. I know the standard error of y.hat. I'm skeptical of a recommendation to use the uncertainty in y.hat as a weight when I regress or calibrate y.hat to another variable. There are a few parts to consider. First, the standard error of y.hat tends to increase with y.hat. So large estimates of abundance will have less weight than lower estimates, which seemingly causes the fit to be biased low. Second, the independent variable is positively correlated with y.hat, so this means that there is more uncertainty on the right-hand side of the plot. This results in heteroscedasticity, which is when I think WLS is appropriate. What I think we have here is a potential trade-off between bias (due to weights covarying with y) vs. accommodating heteroscedasticity.
If the uncertainty was randomly assigned to each data pair, I still don't see why we'd want to use the uncertainty as weights. Here's a little R code to simulate a data when uncertainty is random (default) vs. a function of y.hat (commented out). Rather than regressing, this code calibrates x to y.hat using a mean of ratios. The result is that using weights results in a biased estimate of the true ratio (2) when uncertainty is correlated to y.hat, and an unbiased but relatively imprecise estimate when uncertainty is not correlated to y.hat.
Am I right that using uncertainty in the estimate of y as a weight is inappropriate in this context?
N <- 6 reps <- 5000 out1 <- matrix(NA, reps, 2) for (i in 1:reps){ x <- runif(N, 10, 30) y.hat <- rnorm(N, 2*x, 10) #se <- -0.1 + 0.3*y.hat se <- rnorm(N, 7, 4) w <- 1/se^2 out1[i,1] <- mean(y/x) out1[i,2] <- sum(y/x*w) / sum(w) } hist(out1[,1], 50) hist(out1[,], 50)
y.hat, & that you want to use those predicted values as the predictor in a subsequent model. Is the idea here that animal abundance is a mediator of the relationship between some variables? What is the response in the 2nd model? $\endgroup$