NetTrain automatic settings

Question

I'm trying to compare Mathematica's NetTrain with other algorithms, and I used Automatic settings as they were working quite well.

Now I need to include those results in a paper, but I don't know which method mathematica choose for training.

There are three of those - ADAM, RMSProp and SGD, by altering between those and comparing performance with automatic I found out that ADAM seems to be the one used for my case. But I still don't know a lot of parameters like "LearningRate", "LearningRateSchedule", etc. Is there a way to find them out?

Sometimes you can get at these with Internal`InheritedBlock. Do you have a sample net to work with? The settings may change. In any case you can always look through the DownValues with GeneralUtilities`PrintDefinitionsLocal. — b3m2a1
– b3m2a1, Commented Aug 14, 2017 at 7:56

Taliesin Beynon · Accepted Answer · 2017-08-16 13:24:45Z

For now, you can use the "cheat code" NeuralNetworks`Private`MXTrainer`$OptimizerSpec, which will show the current defaults for the different methods (some internal details are mixed in here):

In[32]:= NeuralNetworks`Private`MXTrainer`$OptimizerSpec/.NeuralNetworks`Defaulting[_,d_]:>d//GeneralUtilities`PrettyForm Out[32]= <| ADAM -> { {ADAM, #}&, <|Beta1 -> 0.9, Beta2 -> 0.999, Epsilon -> Rational[1, 100000]|> }, SGD -> { {SGD, #}&, <|Momentum -> 0.93, LearningRateSchedule -> Polynomial|> }, RMSProp -> { {RMSProp, #}&, <| Beta -> 0.95, Momentum -> 0.9, Epsilon -> Rational[1, 100000000] |> }, $CommonSuboptions -> <| L2Regularization -> ArrayCasesT[Nullable[TensorT[{}, RealT]], 0.], GradientClipping -> ArrayCasesT[Nullable[TensorT[{}, RealT]], None], WeightClipping -> ArrayCasesT[Nullable[TensorT[{}, RealT]], None], InitialLearningRate -> Automatic, LearningRateSchedule -> None, GradientsBag -> None, GradientMapper -> RMSEnergy |> |>

The rule is currently that SGD is used for networks with less than 128 parameters (total weight components), otherwise ADAM is used.

For the ADAM case, I can see that $beta1 = 0.9$, $beta2 = 0.999$, $epsilon = 1e-5$. But what is the main parameter $alpha$? — Miladiouss
– Miladiouss, Commented Jan 25, 2018 at 1:18

Stack Exchange Network

NetTrain automatic settings

1 Answer 1

Linked

Hot Network Questions

NetTrain automatic settings

1 Answer 1

Linked

Related

Hot Network Questions