How is the hyperpriors included? #800
-
Hi! My question is, how is the prior distributions for the hyperparameters included? is it actually the marginal likelihood multiplied with the hyperprior that is optimized? or is it just the marginal likelihood? What I understand is that for a fully Bayesian treatment one would put up a joint posterior distribution for the target function and the hyperparameters and then marginalize over the hyperparameters to get the posterior for the target function. And I thought this would correspond to both learning and account for prior knowledge about the hyperparameters. In botorch there is an intermediate step where the mll is optimized, maybe it corresponds to the same thing? I would be thankful for clarification of this. Also, thanks for a super nice BO tool! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In all of the included examples we use a MAP estimate, so this does the following:
You're correct about the fully Bayesian treatment. We have been doing some work that uses a fully Bayesian treatment using pyro's NUTS MCMC sampler - essentially we run MCMC inference to draw samples from the hyperparameter posterior (instead of finding the/a mode), then load these samples into a batched GP model (as in this gpytorch tutorial), compute the acquisition function in a batched fashion and then marginalize across the hyperparameter samples. @sdaulton do we have any concrete plans for open-sourcing / making a tutorial for this? |
Beta Was this translation helpful? Give feedback.
In all of the included examples we use a MAP estimate, so this does the following:
You're correct about the fully Bayesian treatment. We have been doing some work that uses a fully Bayesian treatment using pyro's NUTS MCMC sampler - essentially we run MCMC inference to draw samples from the hyperparameter posterior (instead of finding the/a mode), then load these samples into a batched GP model (as in this gpytorch tutorial), compute the acquisition function in a batched fashion and then marginalize across the hyperparameter samples.
@sdaulton do we have any concrete plans for open-sourcing / making…