0
$\begingroup$

An ARMA(p,q)-GARCH(r,s) model specifies the conditional distribution of a time series $x_t$:

\begin{aligned} x_t &= \mu_t + u_t, \\ \mu_t &= \varphi_0 + \varphi_1 x_{t-1} + \dots + \varphi_p x_{t-p} + \theta_1 u_{t-1} + \dots + \theta_q u_{t-q}, \\ u_t &= \sigma_t \varepsilon_t, \\ \sigma_t^2 &= \omega + \alpha_1 u_{t-1}^2 + \dots + \alpha_s u_{t-s}^2 + \beta_1 \sigma_{t-1}^2 + \dots + \beta_r \sigma_{t-r}^2, \\ \varepsilon_t &\sim iid(0,1). \end{aligned}

Suppose based on ARMA(p,q) model we predicted conditional mean for the forecast horizon $h$, i.e., we estimated $\hat{\mu}_{t+1}, \hat{\mu}_{t+2}, ..., \hat{\mu}_{t+h}$. Further, from GARCH(r,s) model we estimated conditional variance for the forecast horizon $h$, i.e., $\hat{\sigma}^{2}_{t+1},\hat{\sigma}^{2}_{t+2},..., \hat{\sigma}^{2}_{t+h}$.

The problem is to come up with forecasts for the process $x$, i.e., $\hat{x}_{t+1}, \hat{x}_{t+2}, ..., \hat{x}_{t+h}$. Accepted approach is to take predicted conditional means $(\hat{\mu}_{t+1}, \hat{\mu}_{t+2}, ..., \hat{\mu}_{t+h})$ for point forecast for time series ${x}$.

Question: Why not utilize predicted conditional variance to come up with point forecasts? Is there anything wrong to estimate point forecasts as follows: $\hat{x}_{t+1} = \hat{\mu}_{t+1} + \hat{\sigma}^{2}_{t+1}\epsilon_{t+1}, \hat{x}_{t+2} = \hat{\mu}_{t+2} + \hat{\sigma}^{2}_{t+2}\epsilon_{t+2}, ..., \hat{x}_{t+h} = \hat{\mu}_{t+h} + \hat{\sigma}^{2}_{t+h}\epsilon_{t+h}$?

P.S. The reason that I am suggesting the above described approach is that to "save" volatility clustering effect in forecasted values as well. If we utilize only mean equation (ARMA part), we remove volatility clustering effect from predicted values.

$\endgroup$
3
  • $\begingroup$ Your point forecasts are infeasible, because they involve futures observations of the error term. The expected values of these terms are zero, so the point forecasts that target the conditional mean boil down to just the $\hat\mu$s. $\endgroup$ Commented Apr 8 at 19:28
  • $\begingroup$ The point predictions derived from $\hat{x}_{t+i} = \hat{\mu}_{t+i} + \hat{\sigma}^{2}_{t+i}\epsilon_{t+i}$ for all $i$'s, is just one sequence of realizations of point predictions, which over many paths boil down to the $\hat{\mu}_{t+i}$'s, correct? Then why it is wrong to randomly generate a sequence of realizations of point predictions (randomly because we are going to randomly pick a realization from $\epsilon_{t+i} \sim N(0, \hat{\sigma}_{t+i})$)? $\endgroup$ Commented Apr 9 at 8:54
  • $\begingroup$ I have never seen a point prediction generated like this, but I suppose you can do that. This will not be optimal for any target functional of the distribution, however. If you suffer loss from forecast errors (there is a loss function $L(e)$), you can derive an optimal point forecast, and it will be a fixed functional of the distribution (such as mean, quantile, expectile, ...), not a random realization from the distribution. On the other hand, you can simulate scenarios (future paths) in your way, and that is fine for quantifying uncertainty or doing some sort of optimization. $\endgroup$ Commented Apr 9 at 10:18

2 Answers 2

2
$\begingroup$

Initial answer:

  • Your point forecasts are infeasible, because they involve futures observations of the error term. The expected values of these terms are zero, so the point forecasts that target the conditional mean boil down to just the $\hat\mu$s. Also, your multi-step forecasts ($h>1$) ignore shocks occurring in the intermediate periods.

Addition after a clarifying comment:

  • I have never seen a point prediction generated like this [with a random realization of a variable added to a known constant], but I suppose you can do that. This will not be optimal for any target functional of the distribution, however. If you suffer loss from forecast errors (there is a loss function $L(e)$), you can derive an optimal point forecast, and it will be a fixed functional of the distribution (such as mean, quantile, expectile, ...), not a random realization from the distribution. On the other hand, you can simulate scenarios (future paths) in your way (once you include intermediate shocks in your multi-step forecasts), and that is fine for quantifying uncertainty or doing some sort of optimization.
$\endgroup$
3
$\begingroup$

The model you specify assumes independence of the mean and variance. We can work through the equations as follows.

Start with the time $t+h$ equation for $x$: $$x_{t+h}=\mu_{t+h} + u_{t+h}$$ Under squared error loss, we can use the expectation operator to find the point-forecast: $$E(x_{t+h})=E(\mu_{t+h}) + E(u_{t+h})$$ Plugging in for $u_{t+h}$ we have: $$E(x_{t+h})=E(\mu_{t+h}) + E(\sigma_{t+h}\varepsilon_{t+h})$$ Using the formula for covariance, this is equivalent to: $$E(x_{t+h})=E(\mu_{t+h}) + E(\sigma_{t+h})E(\varepsilon_{t+h})+Cov(\sigma_{t+h},\varepsilon_{t+h})$$ From the model assumption that $\varepsilon_t$ is iid, mean zero, we have both $E(\varepsilon_{t+h})=0$ and $Cov(\sigma_{t+h},\varepsilon_{t+h})=0$, so we are left with: $$E(x_{t+h})=E(\mu_{t+h})$$ Or using your hat notation: $$\hat{x}_{t+h} = \hat{\mu}_{t+h} $$

There are models in which the variance impacts the conditional mean. For example, the stochastic volatility in mean model or the ARCH in mean (ARCH-M) model.

$\endgroup$
3
  • $\begingroup$ Yes, thank you, I can see this. But my point is why we can't randomly simulate a path of future predictions. Please see my comment above, under the question. $\endgroup$ Commented Apr 9 at 8:57
  • $\begingroup$ I would agree with Richard. What you are describing sounds like a single simulated path. Typically for point predictions you would either use theory (like my answer above) or you would generate a large number of simulated paths, how you describe, and then take the relevant metric (mean, median, quantile, etc.) at each point in time. If you did what you are describing a large number of times and took the mean at each point in time, you should get arbitrarily close to the theoretical point-forecast since the error terms will average out to (nearly) zero. $\endgroup$ Commented Apr 9 at 16:13
  • $\begingroup$ Yes, agreed, I mean to take large number of simulated paths. Thank you! $\endgroup$ Commented Apr 11 at 9:52

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.