Understanding and applying results of bayesopt
14 views (last 30 days)
I have some difficulties understanding the Matlab documentation of the bayesopt function.
For example, the bestPoint function offers a couple of "best points" of a Bayesian optimization result. Which one should be used in order to get the best out-of-sample predictive accuracy?
Let's say I let bayesopt find the "best" hyperparameters for a regression tree ensemble (by actually using fitrensemble directly instead of the bayesopt function) and obtain the following result graphs:
What do both graphs (if at all) tell about the "best point", convergence, predictive accuracy etc. (generally, but also considering especially this example)? Are there any sources that explain these concepts, at least at a higher level, so that I can better make use of bayesopt?
Don Mathis on 16 Sep 2019
Edited: Don Mathis on 16 Sep 2019
It looks like no new minima are being found, and that the model of the objective function is stabilizing, but it's not a good model. The model has minima that are negative. A negative value for log(1+Loss) implies that Loss<0, which is impossible for MSE loss.
I've seen this happen when there is a steep "cliff" in the objective function (over hyperparameter space). The Gaussian Process model of that function smooths out the cliff and thereby undershoots the true function (and zero) at the base of the cliff. In fact, the reason that the objective function when optimizing regression fit functions is defined as log(1+Loss) instead of Loss, is to try to reduce the size of such cliffs to reduce the chance of overshoots like this.
To diagnose this, you could look at the values of the objective function that are being found, to see if they differ by orders of magnitude.
Regarding bestPoint, since the model is not giving a resonable estimate of the minimum of the objective function, it would probably be better to trust the minimum observed point, and use the 'min-observed' criterion.
More Answers (1)
Don Mathis on 19 Sep 2019
Edited: Don Mathis on 19 Sep 2019
You already have access to some of those options (e.g., AcquisitionFunctionName) through fitrensemble, via the 'HyperparameterOptimizationOptions' name value pair. https://www.mathworks.com/help/stats/fitrensemble.html?searchHighlight=fitrensemble&s_tid=doc_srchtitle#d117e406156
I wouldn't expect increasing NumSeedPoints to help. Changing the acquisition function may help, but it's hard to know without more information. You can also try a grid search or random search via the 'Optimizer' field of the struct you pass to 'HyperparameterOptimizationOptions'. Neither of those methods would use a model at all, so it would at least be interesting to see what you get in that case. I would also look at the command-line output and all the plots to see what it's exploring. You can view all the plots by calling plot(model.HyperparameterOptimizationResults,'all') on your final model.
In any case, if it is a "cliff" scenario, it's likely that it has found the best point(s) at the bottom of the cliff, even if it expects the loss to be less than zero there.
Also: IsObjectiveDeterministic is already set to false for fitrensemble, and yes it can help even if the function is actually deterministic.