Multiple trainings / multiple NN initialisations per Hyperparamter validation with Optuna and pruning

Question

I am just doing my first ML-with-optuna project. My question is how can I probe one set of hyperparamters for multiple NN initialization, where each run within one trial is still subject to pruning?

I am assuming that the initialization has quite some influence and I don't want to strike out good HP due to bad luck.

As far as I know each trial represents one set of HP. So if I want to eval them for multiple initialization I perform multiple trainings per trial. But within one trial I can only report one value for each timestamp.

Do I have to implement this without optuna? Should I go for an approach which lets optuna first suggest a set of HP and then fixes it for the next trials? Or do you know some good approach to achieve this with optuna?

Many thanks in advance!

Edit 1; Adding a minimal code example:

from random import randrange import optuna def objective(trial): """ return x * 20 + random_offset multiplication calculated iteratively to enable pruning """ x = trial.suggest_float("x", 0, 10) random_offset = randrange(0, 1000) temp = 1 res_temp = None for i in range(20): temp += x res_temp = temp + random_offset trial.report(res_temp, i) if trial.should_prune(): raise optuna.TrialPruned() return res_temp if __name__ == '__main__': study = optuna.create_study(pruner=optuna.pruners.MedianPruner()) study.optimize(objective, n_trials=20) print("best params:", study.best_params) print("best value:", study.best_value)

This example tries to find the "x" in a range of 0 to 10 which minimizes "x * 20". The obvious answer is 0. The objective function is calculating the result based on iterative summation; which uses pruning. Sadly the objective function is noisy due to the random offset. This is meant as a metaphor for training a NN. The iteration is the training loop, x is the hyperparamter and the offset is the random initialization of the network.

The problem which is caused by the noise is that you can't determine the quality of a hyperparamter for sure as the result might be dominated by the random offset. This might lead to selecting a sub-optimal x. If I am right, than increasing the number of trials, to smooth out the randomness, might not work as optuna might suggest new hyperparamters based on the old ones. So unlucky observations will hinder the the further progress.

So I assumed it would be best to evaluate the objective several times for the same set of hyperpramters and only remember the best "run".

So my question is how to best smooth out the noise? Is my assumption correct that increasing the number of trials only is not the best approach and how would you implement the repeated evaluation?

@ferdy I add an example and some explanation. Does this help you understanding the issue? — Osmosis D. Jones
– Osmosis D. Jones, Commented Jan 31, 2022 at 13:33

opocaj · Accepted Answer · 2022-06-27 22:26:45Z

A way to achieve this is to define a wrapper around the objective. This works because this wrapper will be called once for a new trial, but inside the wrapper we call the original objective multiple times.

Toy example:

import optuna import random def objective(trial, seed=0): random.seed(seed) a = trial.suggest_float('test', 0, 1) return a def objective_wrapper(trial, nrseeds): res = [] for ii in range(nrseeds): rr = objective(trial, seed=ii) res.append(rr) # add the individual results as an attribute to the trial if you want trial.set_user_attr("individual_seed_results", res) # let's print just to visualize the individual runs print('=====') print(res) return sum(res)/len(res) #could be some other aggregation study = optuna.create_study( study_name='tst', ) study.optimize( lambda trial: objective_wrapper(trial, 3), n_trials=5, )

If you run this, then in this case the wrapper will print something like:

===== [0.9422219634474698, 0.9422219634474698, 0.9422219634474698] ===== [0.3789947506000524, 0.3789947506000524, 0.3789947506000524] ===== [0.25406979924952877, 0.25406979924952877, 0.25406979924952877] ===== [0.6927210276975587, 0.6927210276975587, 0.6927210276975587] ===== [0.3583263556988684, 0.3583263556988684, 0.3583263556988684]

ferdy · Accepted Answer · 2022-02-01 04:14:39Z

Since your objective is now also dependent on randomness it is best to evaluate the objective several times as what you have assumed.

But even better try to identify where the randomness came from, is it from the seed number? If not then you really need more trials and more evaluation of complete epoch.

It would look something like this from the optuna example. Each epoch or step, the model is evaluated n_train_iter times for the same parameter.

import numpy as np from sklearn.datasets import load_iris from sklearn.linear_model import SGDClassifier from sklearn.model_selection import train_test_split import optuna X, y = load_iris(return_X_y=True) X_train, X_valid, y_train, y_valid = train_test_split(X, y) classes = np.unique(y) def objective(trial): alpha = trial.suggest_float("alpha", 0.0, 1.0) clf = SGDClassifier(alpha=alpha) n_train_iter = 100 for step in range(n_train_iter): clf.partial_fit(X_train, y_train, classes=classes) intermediate_value = clf.score(X_valid, y_valid) trial.report(intermediate_value, step) if trial.should_prune(): raise optuna.TrialPruned() return clf.score(X_valid, y_valid) study = optuna.create_study( direction="maximize", pruner=optuna.pruners.MedianPruner( n_startup_trials=5, n_warmup_steps=30, interval_steps=10 ), ) study.optimize(objective, n_trials=20)

You can go further by calling

X_train, X_valid, y_train, y_valid = train_test_split(X, y)

multiple times just to find the the best objective value.

First: many thanks for helping :) 2nd paragraph: The randomness is from the seed number. Isn't that nearly always the case on a PC? Fixing the seed number doesn't seem quite like the right approach, more like cheating, more like remembering the one lucky trial. If you go for that you could choose all HP randomly and than only remember the best seed, couldn't you? I agree that it is worth making the training as robust as possible by identifying sources of randomness. 3rd paragraph. Seems the statement doesn't fit the code. Each step (loopbody) evaluates only once. Did i miss something?

Collectives™ on Stack Overflow

Multiple trainings / multiple NN initialisations per Hyperparamter validation with Optuna and pruning

2 Answers 2

Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Related