How is the hyperpriors included? #800

CooolLaserPlasma · 2021-05-20T07:58:03Z

CooolLaserPlasma
May 20, 2021

Hi!
I'm a bit confused about the hyperparameters (e.g. the lengthscales of the kernel)

My question is, how is the prior distributions for the hyperparameters included? is it actually the marginal likelihood multiplied with the hyperprior that is optimized? or is it just the marginal likelihood?

What I understand is that for a fully Bayesian treatment one would put up a joint posterior distribution for the target function and the hyperparameters and then marginalize over the hyperparameters to get the posterior for the target function. And I thought this would correspond to both learning and account for prior knowledge about the hyperparameters. In botorch there is an intermediate step where the mll is optimized, maybe it corresponds to the same thing? I would be thankful for clarification of this.

Also, thanks for a super nice BO tool!

Answered by Balandat

May 20, 2021

In all of the included examples we use a MAP estimate, so this does the following:

is it actually the marginal likelihood multiplied with the hyperprior that is optimized?

You're correct about the fully Bayesian treatment. We have been doing some work that uses a fully Bayesian treatment using pyro's NUTS MCMC sampler - essentially we run MCMC inference to draw samples from the hyperparameter posterior (instead of finding the/a mode), then load these samples into a batched GP model (as in this gpytorch tutorial), compute the acquisition function in a batched fashion and then marginalize across the hyperparameter samples.

@sdaulton do we have any concrete plans for open-sourcing / making…

View full answer

Balandat · 2021-05-20T17:02:32Z

Balandat
May 20, 2021
Collaborator

In all of the included examples we use a MAP estimate, so this does the following:

is it actually the marginal likelihood multiplied with the hyperprior that is optimized?

You're correct about the fully Bayesian treatment. We have been doing some work that uses a fully Bayesian treatment using pyro's NUTS MCMC sampler - essentially we run MCMC inference to draw samples from the hyperparameter posterior (instead of finding the/a mode), then load these samples into a batched GP model (as in this gpytorch tutorial), compute the acquisition function in a batched fashion and then marginalize across the hyperparameter samples.

@sdaulton do we have any concrete plans for open-sourcing / making a tutorial for this?

1 reply

CooolLaserPlasma May 24, 2021
Author

ah nice, thanks a lot!

sounds cool, a tutorial for that would absolutely be interesting:)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the hyperpriors included? #800

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How is the hyperpriors included? #800

CooolLaserPlasma May 20, 2021

Replies: 1 comment · 1 reply

Balandat May 20, 2021 Collaborator

CooolLaserPlasma May 24, 2021 Author

CooolLaserPlasma
May 20, 2021

Replies: 1 comment 1 reply

Balandat
May 20, 2021
Collaborator

CooolLaserPlasma May 24, 2021
Author