In a linear regression model ${\bf y} = {\bf x}^T {\bf w} + \epsilon$, assuming Gaussian noise ($\epsilon\sim N(0,\sigma_n^2)$), and Gaussian priors on the weights (${\bf w}\sim N({\bf 0}, \Sigma_p))$, one can show that the posterior distribution for the weights given data $X,{\bf y}$ is also Gaussian, with: $$p({\bf w}|X,{\bf y})\sim N\left(\bar{\bf w}=\frac{1}{\sigma_n^2}A^{-1}X{\bf y}, A^{-1}\right)$$ Where $A = \sigma_n^{-2}X X^T + \Sigma_p^{-1}$. In order to use this posterior to find the probability distribution $p(f({\bf x_*}))$ at some new point $\bf x_*$, we need to average over all possible weights (sometimes termed the Bayesian estimator): $$p(f({\bf x_*})|{\bf x_*},X,{\bf y})=\int p(f({\bf x_*})|{\bf x_*},{\bf w})p({\bf w}|X,{\bf y})d\bf{w}$$ I'd like to prove that this is just the Gaussian: $$\sim N\left(\frac{1}{\sigma_n^2}{\bf x_*}^T A^{-1}X {\bf y}, {\bf x_*}^T A^{-1}{\bf x_*}\right)$$ However, something in the integration doesn't seem to work out.
Can someone show how this is done?