Return to Answer

comments

edited Jan 8, 2019 at 6:37

4.4k
13
23

EDIT If you have never used a Bayesian method before, you should seek detailed support from someone you work with. I state this because your comment has the feel of someone who hasn't done it before.

A Bayesian prior summarizes all of your knowledge about the location of the parameters in a model, not just $\beta$. In this case, if $\tilde{\mathbf{x}}$ is being treated as a parameter, then it also needs a prior.

A prior is proper if $$\int_{\theta\in\Theta}\pi(\theta)\mathrm{d}\theta=1.$$ It is improper if that statement is not true. Frequentist solutions usually map to a Bayesian solution with an improper prior. The challenge is that for a normal likelihood, when you have three or more independent variables, then an improper prior will cause the posterior to not integrate to one. You will get paradoxes in your solution.

The prior comes from information outside the sample. You would assign a prior for each parameter on the full data set. The difference between Bayesian and Frequentist methods is that in Frequentist methods the sample is random. In Bayesian methods, the parameters are the random variables. The sample is treated as a constant. For the full set, you could assign a normal-gamma or normal-inverse-gamma distribution. For the partial data set, you would use the posterior of the full set as the prior of the new set, but add one dimension, your believed distribution for $\tilde{\mathbf{x}}.$

You are correct, you would capture your prior from the $\mathbf{x}_i$, but it wouldn't be a simulation. As you observed more and more data from outside the full set, your shape will change. Each observation will cause Bayesian updating which will change the distribution of $\mathbf{x}_i$.

Source Link

answered Jan 7, 2019 at 4:50

Dave Harris

4.4k
13
23

Disclaimer If I were doing it, I would abandon mean-variance finance, however, if my employer insisted that I do it, then I would treat it as a modeled parameter using a Bayesian method.

Most Bayesian methods seek a posterior of the sort such as $$\Pr(\theta|\mathbf{x};\mathbf{y}),$$ where $\mathbf{x}$ and $\mathbf{y}$ are variables. Ignoring the fundamental differences of interpretation and calculation, this is no different than a Frequentist method. Take your data and construct your estimator for $\theta$.

In this case, though, imagine some of $\mathbf{x}$ did not exist because of a lack of overlap in trading times. People are still engaging in trades with their mental models intact as to what they think are the unrevealed prices. So let $\mathbf{x}_i$ be the set of observed cases and $\tilde{\mathbf{x}}$ be the cases where there are no observable variables. The pairing on the $y$ side is $\mathbf{y}_i$ and $\mathbf{y}_j$, where $j$ is the period of missing $x$'s.

Then, during the mutually observed period, I would calculate $$\Pr(\theta|\mathbf{x}_i,\mathbf{y}_i)$$ and use this relationship to form a prior distribution for the second case in order to solve $$\Pr(\theta;\tilde{\mathbf{x}}|\mathbf{y}_j).$$

Once I had $\Pr(\theta;\tilde{\mathbf{x}}|\mathbf{y}_j) $ I would marginalize out $\tilde{\mathbf{x}}$. Assuming differentiability, $$\Pr(\theta|\mathbf{x}_i,\mathbf{y})=\int_{\tilde{\mathbf{x}}\in\chi}\Pr(\theta;\tilde{\mathbf{x}}|\mathbf{y}_j)\mathrm{d}\tilde{\mathbf{x}}.$$

If you have not worked with Bayesian methods before, I treated the unobserved values as a parameter to be estimated and then removed the effect of that variable by integrating out the uncertainty in it. Instead of having one value for the data, you have at each missing data point a distribution of possible values that could have obtained at that time. Of course, a distribution is not what you want, but you make that distribution vanish by integrating it out of existence. Given the theoretical underpinnings of the CAPM, I would then find the posterior mean for each $\beta$ and that would be my model.

Because of the very high dimensionality of the model, in order to avoid marginalization paradoxes, you will want to put a very diffuse proper prior on each $\beta$ to guarantee it integrates to unity. Given the general acceptance in the field of LASSO or Ridge regression, I would put something on or near zero for the beta parameters with a normal density and a very diffuse gamma distribution for the scale parameters. That would allow you to calculate the initial portion with a conjugate prior, though that will not be possible for the second portion.

See, for example, https://www.faculty.agecon.vt.edu/AAEC5126/module6/BayesNormalConjugate.pdf