Looking for some guidance with greater project. Also, small problem: using too much memory. Tips? #1698

AlexStreicher · 2023-02-22T16:31:17Z

AlexStreicher
Feb 22, 2023

Hi there, firstly I wanted to thank you all for the time you've put into making your work publicly available, writing a solid api, and documenting as much as you can while writing tutorials. My background is in physics, so it's thanks to your efforts that I've been able to make progress.

I've been working on essentially trying to make a simple auto-ML. For reference, the function I hope to optimize is indeed black-box. Furthermore, I can use anywhere from $[1, \infty)$ features in doing so, some of which may be discrete or categorical (so far snapping values or one-hot encoding has worked out okay). Furthermore, I can simultaneously query about q~100 points at a time on a cluster, and one such query will take about 1-4 hours. Small twist, the signal-to-noise ratio is extremely small, so most of the domain/features will result in a value of effectively zero. If you have any tips regarding this, please let me know.

Without consideration for feature selection or the issues caused by ignoring it, I've been trying to write a botorch module to perform the exploitation part of the auto-ML for me.

I've been running into a small issue though. It seems that I have too much data! I got a "can't allocate memory: you tried to allocate" 2.9Gb for the batch_initial_conditions = ic_gen(...) step of optimize_acqf when running

N=2500 points, k=17 input dimensions, q=200
num_restarts=8, raw_samples=1024

model = SingleTaskGP(..., 
                likelihood=gpytorch.likelihoods.GaussianLikelihood(noise_constraint=gpytorch.constraints.Interval(1e-8, 1e-7)))
acq_func = botorch.acquisition.qExpectedImprovement(...)
optimize_acqf_kwargs = dict(sequential=False, num_restarts=8, raw_samples=1024)

I put the noise constraint, because I feel that there's isn't really a notion of "measurement noise" for my dataset. For every set of feature values, there's a single fixed Y accuracy value. Let me know if you don't think that was the proper way to deal with this.

Finally, I just wanted to say that I'm not some rando dumping their entire problem on you without putting any effort in. I always got annoyed when people did that with my work. As some who has no background in any of this, and almost none in ML, I'm quite happy that I've gotten this far! Of course, it's all thanks to the documention, tutorials, and papers that you guys wrote.

Please let me know if there's any more information I can provide to help.

saitcakmak · 2023-02-23T17:53:21Z

saitcakmak
Feb 23, 2023
Collaborator

Hi @AlexStreicher, thanks for the kind words! I am not surprised that you're running into memory issues with N=2500 points and q=200. The computational complexity of sampling from Gaussian process posterior scales cubically in N+q (though the quadratic terms might dominate for the most part), which makes it challenging to scale up SingleTaskGP + qEI to thousands of evaluations.

optimize_acqf accepts several options that help reduce the peak memory usage.

Passing sequential=True will enable sequential greedy optimization, where each of q candidates will be generated one by one, while conditioning on the previously generated candidates. This both reduces the size of the computational graph that needs to be kept in the memory for autograd and reduces the dimensionality of the optimization problem under the hood.
You can pass in options = {"batch_limit": <some value less than num_restarts>, "init_batch_limit": <some value less than raw_samples>} which will break the evaluation of acquisition function into mini batches. By default, while generating the initial conditions, we would generate the num_restart candidates and evaluate the acqf with these in a single batch. init_batch_limit specifies the sizes of mini batches to use to evaluate these, which should reduce the memory usage during this stage roughly linearly. Similarly, batch_limit controls how many of the num_restarts candidates are jointly optimized and smaller values should help reduce the memory usage there roughly linearly.

Another option that might be a good fit for your high-throughput use case is the TuRBO algorithm. It is designed for settings where you need thousands of evaluations and you can generate possibly hundreds of candidates in parallel. You can find the tutorial here: https://botorch.org/tutorials/turbo_1

5 replies

Balandat Mar 4, 2023
Collaborator

Also, the reason we usually try to use large-ish num_restarts and raw_samples settings is that the acquisition function is very flat in many places, making it hard to optimize with gradient-based methods. Fortunately, @SebastianAment recently made a lot of progress on reducing this vanishing gradient issue with expected-improvement based acquisition functions. The improvements for qExpectedImprovement still need to land in the OSS code, but if you follow Sait's advice and use sequential=True this should already help a lot in the meantime.

AlexStreicher Mar 11, 2023
Author

Thanks for the advice!

Also, should I be using FixedNoiseGP, SingleTaskGP w/ a noise constraint in the likelihood, or is there no difference?

Balandat Mar 12, 2023
Collaborator

That depends on the constraint. The FixedNoiseGP just uses the provided (potentially heteroskedastic) variances directly and does not infer any noise level. The SingleTaskGP infers a (homoskedastic) noise level (subject to the noise constraint in the likelihood). So if you don't have noise observations, but know (or suspect) that your process is noisy, SingleTaskGP is the way to go - you'd encode any prior on the noise level as a combination of the constraint & noise prior.

AlexStreicher Mar 16, 2023
Author

If I have noiseless observations? What's the recommended practice?

saitcakmak Mar 16, 2023
Collaborator

For noiseless observations, we use FixedNoiseGP with small Yvar, e.g., Yvar = torch.full_like(Y, 1e-6).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Looking for some guidance with greater project. Also, small problem: using too much memory. Tips? #1698

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Looking for some guidance with greater project. Also, small problem: using too much memory. Tips? #1698

AlexStreicher Feb 22, 2023

Replies: 1 comment · 5 replies

saitcakmak Feb 23, 2023 Collaborator

Balandat Mar 4, 2023 Collaborator

AlexStreicher Mar 11, 2023 Author

Balandat Mar 12, 2023 Collaborator

AlexStreicher Mar 16, 2023 Author

saitcakmak Mar 16, 2023 Collaborator

AlexStreicher
Feb 22, 2023

Replies: 1 comment 5 replies

saitcakmak
Feb 23, 2023
Collaborator

Balandat Mar 4, 2023
Collaborator

AlexStreicher Mar 11, 2023
Author

Balandat Mar 12, 2023
Collaborator

AlexStreicher Mar 16, 2023
Author

saitcakmak Mar 16, 2023
Collaborator