Skip to main content
5 of 7
edited tags; edited tags
Richard Hardy
  • 71.5k
  • 14
  • 129
  • 292

Simple constant-width prediction interval for a regression model

Consider the following approach to generating prediction intervals for a regression problem:

  1. Train a regression model on a training set. Let $f$ denote the fitted model, i.e. $f(x_i)$ is the model's prediction given inputs $x_i \in \mathbb{R}^k$.
  2. Compute the model's predictions and prediction errors $e_i \equiv y_i - f(x_i)$ on a test set (i.e. not the training set used to fit $f$ in step 1).
  3. Define a desired prediction interval coverage probability $p \in \left[0, 1\right]$. For example, we might set $p=0.8$ if we want 80% of future observations to fall inside our prediction intervals. Calculate the $\tau_{\text{lower}} \equiv (1-p)/2$ and $\tau_{\text{upper}} \equiv (1+p)/2$ quantiles of the test set errors. With $p=0.8$ we'd have $\tau_{\text{lower}} = 0.1$ and $\tau_{\text{upper}} = 0.9$, so we'd be calculating the 10th and 90th percentiles of our errors on the test set. Let $Q^\text{upper}$ and $Q^\text{lower}$ denote the error quantiles, and note that we typically expect $Q^\text{lower} < 0 < Q^\text{upper}$.
  4. At prediction time (when making new predictions "in production"), construct prediction intervals $\left[f(x_i) + Q^\text{lower}, f(x_i) + Q^\text{upper}\right]$. The intent is for $y_i$ to be inside this interval with probability $p$.

Note that the intervals described above are "simple" in the sense that their width $Q^\text{upper} - Q^\text{lower}$ does not vary with $x_i$. We compute the width in step 3 using our test set errors, and it is fixed going forward to step 4.

Question: If we assume that the data in all steps (the training set, the test set, and the "production" data) are i.i.d. samples from the same data generating process, i.e. from the same joint distribution over $X$ and $Y$, will the prediction intervals described in step 4 have the desired coverage in expectation? Will we have $\Pr\left[y_i \in \left[f(x_i) + Q^\text{lower}, f(x_i) + Q^\text{upper}\right]\right] \approx p$? (Note that this probability is not conditional on $x_i$. There could be values of $x_i$ for which $\Pr\left[y_i \in \left[f(x_i) + Q^\text{lower}, f(x_i) + Q^\text{upper}\right] \, | \, x_i\right] \neq p$, and that would be fine as long as we have the desired coverage on average, taking an expectation over all future observations.)

If so, does this technique have a name? Would it work for any arbitrary regression model (linear regression, gradient boosting, neural nets, etc), and for any distribution of $Y \,|\, X$ (not necessarily Gaussian)?


Related questions:


Python 3.8 simulation (using https://catboost.ai/docs/concepts/python-reference_catboostregressor.html): https://gist.github.com/atorch/ee2caf3b81156273e9df3947c8f8b854. (I originally tried copying the code directly into a < pre > block, but I couldn't get it to render properly.

Adrian
  • 4.5k
  • 3
  • 25
  • 39