In BARRA style factor models, asset returns are modelled as a linear function of factor exposures $$ R_t = X_tf_t + \varepsilon_t, $$ where $R_t$ is the return vector, $f_t$ the factor returns, and $\varepsilon_t$ the residual (asset-specific) return. $X_t$ the factor exposure matrix, which contain information about each asset, such as its country, industry, or growth rate.
The coefficients $f_t$ and $\varepsilon_t$ are estimated using cross-sectional regression. Given a time series of factor and residual returns, we can then model their distribution, and compute sample statistics (i.e. factor covariance matrix and residual variance).
The estimation universe typically contains a large number of global equities, traded across different time zones. If $R_t$ contains returns computed at market close, this can lead to problems for model estimation. For example, if (actual) asset returns are correlated, this leads to (cross) autocorrelation in $R_t$. My question, is how to properly account for this autocorrelation when estimating a factor model?
- As a concrete example: Since Asian markets close relatively early, Asian asset returns positively correlate with the previous day's US returns (when measured at close). If not accounted for, this will cause us to underestimate the correlation between (same day) Asia and US returns.
The obvious solution is to compute returns using the same (absolute) point in time. However, closing prices are often readily available, and using them avoids after-market prices, which can come with their own problems.
Another solution could be to add an additional factor to the exposure matrix $X_t$, such as market closing time, in the hopes of catching this effect. But how then do we interpret the return attributed to this factor, as well as the "risk" it implies?
Edit: A third option is to aggregate returns over time. Suppose we have two stocks, whose (log) returns are jointly normal with correlation $\rho$, and independent increments. If measured returns $\tilde{R}_t$ are computed at overlapping increments, their correlation will be $a\rho$, where $a$ is the fraction of overlap.
Aggregating across $N$ increments, effectively changes the fraction of overlap to $a_N = 1-\frac{1-a}{N}$, and thus measured correlation will tend to the true correlation, as the aggregation window increases.
However, aggregation throws out a lot of information, essentially decreasing the sample size by a factor of $N$.