0
$\begingroup$

I have monthly time series $\{y_t\}$ and $\{x_t\}$ (continuous variables) with just over 200 observations. I model $y_t$ conditional on $x_t$ them as follows: $$ y_t = \alpha_1Jan_t + \dots + \alpha_{12}Dec_t + \beta_1Jan_tx_t + \dots + \beta_{12}Dec_tx_t + \varepsilon_t \tag{1} $$ where $Jan_t, \dots, Dec_t$ are monthly dummies and $\varepsilon_t$ is both autocorrelated and heteroskedastic. Specifically, OLS estimation of regression $(1)$ yields residuals with statistically significant autocorrelation at lag 1 ($\rho(1)\approx 0.25$) and different variances in each calendar month.

OLS residuals: raw, ACF, heteroskedasticity

I am interested in testing $\text{H}_{0,A}\colon \alpha_1 = \dots = \alpha_{12} = 0$ as well as $\text{H}_{0,B}\colon \beta_1 = \dots = \beta_{12} = 0$.

If the errors were i.i.d., there would not be a problem. However, with autocorrelated and heteroskedastic errors, I am having some trouble. I have tried a few different approaches, but none seems to work as indicated by simulations. I simulated data from $z_t = u_t$ with $u_t\sim i.i.d.(0, \sigma_{\varepsilon}^2)$ where $\sigma_{\varepsilon}^2$ is the estimated unconditional variance of $\varepsilon$ from regression $(1)$. That is, in my simulation both $\text{H}_{0,A}$ and $\text{H}_{0,B}$ hold. The error term is much simplified, and perhaps I should not have done that, but that was the easiest to start from.

The estimators have I tried and the simulation results are presented next.

  1. OLS point estimates with HAC covariance matrix for the coefficients. Specifically, I estimated regression $(1)$ by OLS and then used sandwich::NeweyWest with either default settings or with lag=1. I also tried sandwich::vcovHC(type = "HC3") just in case. The simulations showed a major tendency to overreject both $\text{H}_{0,A}$ and $\text{H}_{0,B}$ at test sizes of 5%, 1% and 0.1%. Here are some histograms of $p$-values to illustrate that (using NeweyWest):

Histograms of p-values from OLS-HAC simulations

The first row corresponds to testing $\text{H}_{0,A}$ and the second row to $\text{H}_{0,B}$. The different columns contain the same data but at different levels of binning. We want the histograms to be uniform, but they are clearly not that.

  1. GLS point estimates with a corresponding default covariance matrix for the coefficients implemented as follows:

nlme::gls(model = z ~ 0 + dum + x:dum, data = data.frame(z = z[, t], x = x[, t], dum = dum), corr = corARMA(p = 1, q = 1), weights = varIdent(form = ~ 1 | month))

where month is a categorical variable indicating calendar months of the year. (I also tried corr = corAR1.) The simulation results were better, but again, the there was a major tendency to overreject both $\text{H}_{0,A}$ and $\text{H}_{0,B}$ at test sizes of 5%, 1% and 0.1%. (I also expected the results to be better, because the covariance structure is more restrictive here than in NeweyWest, but unfortunately the results were still not satisfactory.)

Histograms of p-values from GLS simulations

  1. ARMA-GARCH point estimates with default covariance matrix for the coefficients. I implemented a model similar to the GLS one above using rugarch but got poor estimates of the month-specific variances (they were all zero) and did not proceed. I thought about tsgarch that allows multiplicative external regressors in the variance equation (I thought maybe that could work) but unlike rugarch it does not allow for explicitly modelling autocorrelations of the errors, so I did not proceed with it either.

Question: Any ideas what else I could try? Or is this such a fundamental problem that there is no way around it? Also, originally I wanted to run this in rolling windows of just 120 observations (10 years) instead of over 200, but that would aggravate the small sample problem even more...

$\endgroup$
1

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.