I'm trying to understand Gaussian Processes. Could anyone tell me:
- Why we need to use the log marginal likelihood?
- Why using log, the marginal likelihood can be decomposed to 3 terms (including a fitting term and a penalty term)?
I'm trying to understand Gaussian Processes. Could anyone tell me:
The marginal likelihood is generally used to have a measure of how the model fitting. You can find the marginal likelihood of a process as the marginalization over the set of parameters that govern the process This integral is generally not available and cannot be computed in closed form. However, an approximation can be found with the sum of the complete likelihood and a penalization term, that I suppose is the decomposition that you mentioned on point 2.
The likelihood is generally computed in logarithmic scale for numerical stability reason: consider a computer that can store only numbers between 99,000 and 0.001 (only three decimal) plus the sign. If you compute a density and in some point this has value 0.0023456789, in the computer this will be stored as 0.003 losing part of the real value, if you compute it in log scale the log(0.0023456789)=−6.05518 will be stored as −6.055 losing less than in original scale. If you multiply a lot of small values, the situation get worst: consider 0.00234567892 that will be store as 0 while log(0.00234567892)=−12.11036