Multi-objective, multi-fidelity optimization. What do I need? #2758

EvanClaes · 2025-03-03T09:42:26Z

EvanClaes
Mar 3, 2025

Hi all,

I'm working on optimizing a multi-objective biological process where experimental evaluations are extremely expensive. To mitigate costs, we initially conducted a space-filling design to obtain some high-fidelity experimental data. Additionally, we have a mathematical model of the process, which, while less reliable than real experiments, is orders of magnitude cheaper to evaluate (about 100,000x cheaper).

Goal

I want to generate new experimental candidates in parallel (multiple candidates) that maximize the process outcome, but I do not want to suggest low-fidelity evaluations. The model should only be used to inform and improve our high-fidelity experimental selection.

Proposed Approach

I’m currently considering the following:

Generate a large number (1000 or so) low-fidelity observations to complement the high fidelity experimental dataset.
Model the data with a SingleTaskMultiFidelityGP
Use qNEHVI as the acquisition function
Optimize the acquisition function while specifying fixed_features with the fidelity dimension fixed at high fidelity

Questions:

In the training data, should I set the fidelity to zero for my low-fidelity observations?
Is qNEHVI compatible with the multi-fidelity setting?
The discrete MF tutorial example uses qMFKG, along with quite some additional stuff like such as a cost model. I don’t need this, right?

Many thanks for your insights!

Answered by EvanClaes

Mar 24, 2025

Hello Max,

I solved the second issue (with all candidates being the same); I had an issue with my constraint.

I can still provide a toy example if you suspect the warning message may be problematic. Otherwise, you can close this topic. To me the candidates that are being produced make sense.

Thanks for all the help!

View full answer

Balandat · 2025-03-03T16:19:19Z

Balandat
Mar 3, 2025
Collaborator

Hi @EvanClaes, a general reference for multi-objective multi-fidelity (esp. within botorch) is our 2023 ICML paper on the Hypervolume Knowledge Gradient: https://proceedings.mlr.press/v202/daulton23a.html. There is also a tutorial on decoupled MOBO (thought that may not be 100% relevant in your context where you can "precompute" a bunch of low fidelity evaluations): https://botorch.org/docs/tutorials/decoupled_mobo/

we have a mathematical model of the process, which, while less reliable than real experiments, is orders of magnitude cheaper to evaluate

IIUC this means that there are two discrete fidelities: costly full eval and mathematical model (but no feature that would describe the level of approximation), right?

Model the data with a SingleTaskMultiFidelityGP

SingleTaskMultiFidelityGP is designed for cases in which you have one or more fidelity dimensions that vary from low to high - seems like this isn't the case for you. In that setting, the fidelity kernel used in SingleTaskMultiFidelityGP is probably not a great choice. Instead, what you can do in this setting is use a multi-task model with two tasks (via MultiTaskGP), where one is going to be the low fidelity and the other is going to be the high fidelity one. You can define this using data from both, and then use the output_tasks https://github.com/pytorch/botorch/blob/main/botorch/models/multitask.py#L113 argument to tell the model to just predict the behavior of the high-fidelity task (but using data from the low fidelity task).

With that you can follow the rest of your proposed approach:

Use qNEHVI as the acquisition function

Optimize the acquisition function while specifying fixed_features with the fidelity dimension fixed at high fidelity

where the fixed feature just is the high-fidelity task index.

Hope this helps - let us know if you have additional questions.

7 replies

EvanClaes Mar 18, 2025
Author

Thanks for the feedback Max

I'm afraid this is beyond my capabilities (or at least sounds like it). What I will try to do is expand the mathematical model based on some empirical relationships observed in historical data.

EvanClaes Mar 19, 2025
Author

Hello Max,

A couple more questions:

When specifying X_baseline for qLogNoisyExpectedHypervolumeImprovement should this be both the low and high-fidelity data, or just the high-fidelity?
Currently, when I specify fixed_features = {2: 1} (the task dimension is the third column), all the candidates in my batch (q = 4) are exactly the same. This is not the case when I don't specify any fixed features
I'm also getting the following error message when defining my acquisition function, maybe it's related?:
C:\Users\EvanClaes\anaconda3\lib\site-packages\botorch\acquisition\cached_cholesky.py:87: RuntimeWarning: `cache_root` is only supported for GPyTorchModels that are not MultiTask models and don't produce a TransformedPosterior. Got a model of type <class 'botorch.models.model_list_gp_regression.ModelListGP'>. Setting `cache_root = False`. warnings.warn(

You can find my code below

def initialize_model(train_x, train_obj, bounds, train_noise):
    train_x_norm = normalize(train_x, bounds)
    models = []
    for i in range(train_obj.shape[-1]):
        train_y = train_obj[..., i : i + 1]
        train_yvar = train_noise[..., i : i + 1]
        models.append(
            MultiTaskGP(
                train_x_norm, train_y, task_feature=-1,  output_tasks = [1], outcome_transform=Standardize(m=1)
            )
        )
    model = ModelListGP(*models)
    mll = SumMarginalLogLikelihood(model.likelihood, model)
    return mll, model

#dfine bounds, noise values, and reference point
actualBoundsMT = torch.tensor([[10,24,0],[60,192,1]], dtype=torch.float64)
qnehviBounds = torch.tensor([[0,0,0],[1,1,1]], dtype=torch.float64)
NOISE_SE_highF = torch.tensor([0.56/np.sqrt(6), 2.86/np.sqrt(6)], **tkwargs)
NOISE_SE_lowF = torch.tensor([0, 0], **tkwargs)
refPoint = torch.tensor([0,0], dtype=torch.float64)
qnehvi_sampler = SobolQMCNormalSampler(sample_shape=torch.Size([128]))

#define input and objective tensors
train_x1 = torch.tensor(highFdata.iloc[:,1:3].values)
train_obj1 = torch.tensor(highFdata.iloc[:,7:9].values)
train_x2 = torch.tensor(lowFdata.iloc[:,0:2].values)
train_obj2 = torch.tensor(lowFdata.iloc[:,2:4].values)
train_x1 = torch.cat([train_x1, torch.ones(train_x1.shape[0], 1) ], dim=1)
train_x2 = torch.cat([train_x2, torch.zeros(train_x2.shape[0], 1) ], dim=1)
train_x = torch.cat([train_x1, train_x2], dim=0)
train_obj = torch.cat([train_obj1, train_obj2])
train_noise = torch.cat([NOISE_SE_highF.repeat(highFdata.shape[0],1)**2,NOISE_SE_lowF.repeat(lowFdata.shape[0],1)**2])

#make surrogate model
mll, model = initialize_model(train_x, train_obj, actualBoundsMT, train_noise)
fit_gpytorch_mll(mll)

#compute hypervolume
bd = DominatedPartitioning(ref_point=refPoint, Y=train_obj)
volume = bd.compute_hypervolume().item()

#define acquisition function
acq_func = qLogNoisyExpectedHypervolumeImprovement(
    model=model,
    ref_point=refPoint,
    X_baseline=normalize(train_x1, actualBoundsMT),
    prune_baseline=True,
    sampler=qnehvi_sampler,
)

#produce query
candidates, _ = optimize_acqf(
    acq_function=acq_func,
    bounds=qnehviBounds,
    q=BATCH_SIZE,
    num_restarts=NUM_RESTARTS,
    raw_samples=RAW_SAMPLES,
    options={"batch_limit": 5, "maxiter": 200},
    sequential=True,
    fixed_features = {2: 1}
)

Thanks,
Evan

Balandat Mar 21, 2025
Collaborator

When specifying X_baseline for qLogNoisyExpectedHypervolumeImprovement should this be both the low and high-fidelity data, or just the high-fidelity?

This should be just the high fidelity points.

Currently, when I specify fixed_features = {2: 1} (the task dimension is the third column), all the candidates in my batch (q = 4) are exactly the same. This is not the case when I don't specify any fixed features

Hmm, interesting. Do you have a minimal example that reproduces this behavior (can use toy or fake data)? It's a bit hard to remote-diagnose.

I'm also getting the following error message when defining my acquisition function, maybe it's related?:
C:\Users\EvanClaes\anaconda3\lib\site-packages\botorch\acquisition\cached_cholesky.py:87: RuntimeWarning: cache_root is only supported for GPyTorchModels that are not MultiTask models and don't produce a TransformedPosterior. Got a model of type <class 'botorch.models.model_list_gp_regression.ModelListGP'>. Setting cache_root = False. warnings.warn(

Hmm in principle that shouldn't be an issue - it's just a warning, the effect of this should just be that it won't be as fast as in settings where we can cache the root decomposition. Maybe @sdaulton has some immediate ideas? We'll likely need to dig into a reproducible example though.

EvanClaes Mar 24, 2025
Author

Hello Max,

I solved the second issue (with all candidates being the same); I had an issue with my constraint.

I can still provide a toy example if you suspect the warning message may be problematic. Otherwise, you can close this topic. To me the candidates that are being produced make sense.

Thanks for all the help!

Answer selected by Balandat

eytan · 2025-03-03T18:30:52Z

eytan
Mar 3, 2025
Collaborator

Hi Evan, The main thing that the ICM kernel (MuliTaskGP) is that it's going to help with the shared assumptions re: how important each input is (the kernel lengthscale). It will also borrow strength across the data and can handle various linear or potentially non-linear correlations. As a rule of thumb, you want the pearson correlation between the tasks to be at least 0.4 or so. https://jmlr.org/papers/volume20/18-225/18-225.pdf provides a fairly self-contained tutorial and intuition around MTGPs for multi-fidelity modeling (see e.g., S3.1 and particularly S6). Best, e

…

On Mon, Mar 3, 2025 at 12:16 PM EvanClaes ***@***.***> wrote: Hello Max, Thank you so much for this feedback. I indeed have discrete fidelities, so your proposed solution makes sense. Maybe one more question on the low fidelity data. The process model we have captures the dynamics of the real process relatively well, although the predictions are sometimes off by a certain factor. We can calibrate the model with this factor. Would you say that this is important for the MultiTaskGP (does it care about absolute values), or is it sufficient that the general trends/dynamics are represented accurately? Would you mind leaving this open for a couple more days, in case I encounter some issues with the implementation? I can probably try this out by the beginning of next week. Have a great day! — Reply to this email directly, view it on GitHub <#2758 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAW34PUAYE7TFV4X7RQ5J32SSE7HAVCNFSM6AAAAABYGR3Y5WVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTEMZXHA3DINY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

2 replies

EvanClaes Mar 3, 2025
Author

Hello Eytan,

Thanks for the additional insight. I will check the reference out.

My high fidelity data is very sparse (only 4 observations), while i'm working in 6-dimensional space. We'll see if it works. I'll update here in a week or so.

Evan

EvanClaes Mar 18, 2025
Author

Hello Eytan,

A follow-up question: my problem has 6 inputs. However, our mathematical process model only takes 4 of these.

What would you suggest that I do with the other 2 inputs, for the low fidelity rows in the training data? Keep them constant at their midpoint? Give them random values (probably less ideal)? Some other solution?

Thanks,
Evan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-objective, multi-fidelity optimization. What do I need? #2758

{{title}}

Replies: 2 comments 9 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Multi-objective, multi-fidelity optimization. What do I need? #2758

EvanClaes Mar 3, 2025

Replies: 2 comments · 9 replies

Balandat Mar 3, 2025 Collaborator

EvanClaes Mar 18, 2025 Author

EvanClaes Mar 19, 2025 Author

Balandat Mar 21, 2025 Collaborator

EvanClaes Mar 24, 2025 Author

eytan Mar 3, 2025 Collaborator

EvanClaes Mar 3, 2025 Author

EvanClaes Mar 18, 2025 Author

EvanClaes
Mar 3, 2025

Replies: 2 comments 9 replies

Balandat
Mar 3, 2025
Collaborator

EvanClaes Mar 18, 2025
Author

EvanClaes Mar 19, 2025
Author

Balandat Mar 21, 2025
Collaborator

EvanClaes Mar 24, 2025
Author

eytan
Mar 3, 2025
Collaborator

EvanClaes Mar 3, 2025
Author

EvanClaes Mar 18, 2025
Author