Tuning the regression hyperparameters

Question

I am trying to find the hyperparameters of a gaussian process regression algorithm using sklearn. The book (Rasmussen), says I should to maximize the log marginal likelihood given by $$\log(\mathbf{y}|X,\mathbf{\theta})=-\frac{1}{2} \mathbf{y}^TK_y^{-1}\mathbf{y}-\frac{1}{2}\log(\det(K))-\frac{n}{2}\log(2\pi)$$ So I start from a RBF kernel in sklearn with some parameters (can they be easy and random, say just both 1.0?) and then try to find the correct $\theta$? I don't understand this approach, should I do this for each label in my dataset in bulk? Or consider one point of my training set at a time and update the weights at each iteration? I apologise for the confused question, but can somebody explain how to start implementing this method?

Tim · Accepted Answer · 2021-10-18 16:45:32Z

0

The Gaussian process is a Bayesian model. It uses Bayesian updating, so it doesn’t matter if you process the data one sample at a time, or all at once, the result would be the same. There is no reason why you would tune the hyperparameters on a subsample of your data other than using held-out test set for validation.

answered Oct 18, 2021 at 16:45

Tim

144k27 gold badges277 silver badges537 bronze badges

$\begingroup$ Thanks for your reply. My misunderstanding is perhaps at an even more basic level: that optimization formula depends on a single label, so how is the hyperparameter tuning to happen in practice? I implement that formula for a single $y$ and then? How would I even do it for all my training labels "at once"? $\endgroup$

noesis
– noesis

2021-10-18 18:26:27 +00:00
Commented Oct 18, 2021 at 18:26
$\begingroup$ @noesis you are optimizing some metric, for example mean squared error calculated over the whole data. This is the same no matter what ML model you use. $\endgroup$

Tim
– Tim

2021-10-18 18:30:24 +00:00
Commented Oct 18, 2021 at 18:30
$\begingroup$ I am completely lost. I supposedly need to find the $\theta$ that minimizes that expression. Are you saying there is something else I should be minimizing? $\endgroup$

noesis
– noesis

2021-10-18 18:34:09 +00:00
Commented Oct 18, 2021 at 18:34
$\begingroup$ @noesis in case of Gaussian process you could be maximizing marginal log-likelihood, sure. Still, this is a single number metric, aggregated over the whole dataset. $\endgroup$

Tim
– Tim

2021-10-18 18:55:58 +00:00
Commented Oct 18, 2021 at 18:55
$\begingroup$ Ok, thank you. That formula depends on a single label $y$, so how do I implement this maximization in practice? Do it wrt one $y$, obtain new parameters for the kernel, and do it again on the next $y$ with the new kernel? How would I do it "in bulk"? $\endgroup$

noesis
– noesis

2021-10-18 19:02:52 +00:00
Commented Oct 18, 2021 at 19:02

| Show 2 more comments

Stack Exchange Network

Tuning the regression hyperparameters

1 Answer 1

Hot Network Questions

Tuning the regression hyperparameters

1 Answer 1

Related

Hot Network Questions