8
$\begingroup$

I'm trying to compare Mathematica's NetTrain with other algorithms, and I used Automatic settings as they were working quite well.

Now I need to include those results in a paper, but I don't know which method mathematica choose for training.

There are three of those - ADAM, RMSProp and SGD, by altering between those and comparing performance with automatic I found out that ADAM seems to be the one used for my case. But I still don't know a lot of parameters like "LearningRate", "LearningRateSchedule", etc. Is there a way to find them out?

$\endgroup$
1
  • 1
    $\begingroup$ Sometimes you can get at these with Internal`InheritedBlock. Do you have a sample net to work with? The settings may change. In any case you can always look through the DownValues with GeneralUtilities`PrintDefinitionsLocal. $\endgroup$ Commented Aug 14, 2017 at 7:56

1 Answer 1

9
$\begingroup$

For now, you can use the "cheat code" NeuralNetworks`Private`MXTrainer`$OptimizerSpec, which will show the current defaults for the different methods (some internal details are mixed in here):

In[32]:= NeuralNetworks`Private`MXTrainer`$OptimizerSpec/.NeuralNetworks`Defaulting[_,d_]:>d//GeneralUtilities`PrettyForm Out[32]= <| ADAM -> { {ADAM, #}&, <|Beta1 -> 0.9, Beta2 -> 0.999, Epsilon -> Rational[1, 100000]|> }, SGD -> { {SGD, #}&, <|Momentum -> 0.93, LearningRateSchedule -> Polynomial|> }, RMSProp -> { {RMSProp, #}&, <| Beta -> 0.95, Momentum -> 0.9, Epsilon -> Rational[1, 100000000] |> }, $CommonSuboptions -> <| L2Regularization -> ArrayCasesT[Nullable[TensorT[{}, RealT]], 0.], GradientClipping -> ArrayCasesT[Nullable[TensorT[{}, RealT]], None], WeightClipping -> ArrayCasesT[Nullable[TensorT[{}, RealT]], None], InitialLearningRate -> Automatic, LearningRateSchedule -> None, GradientsBag -> None, GradientMapper -> RMSEnergy |> |> 

The rule is currently that SGD is used for networks with less than 128 parameters (total weight components), otherwise ADAM is used.

$\endgroup$
3
  • $\begingroup$ what the default learning rate of each methods? $\endgroup$ Commented Dec 10, 2017 at 3:15
  • $\begingroup$ For the ADAM case, I can see that $beta1 = 0.9$, $beta2 = 0.999$, $epsilon = 1e-5$. But what is the main parameter $alpha$? $\endgroup$ Commented Jan 25, 2018 at 1:18
  • 1
    $\begingroup$ This doesn't work on Mathematica 11.3. $\endgroup$ Commented Jan 25, 2018 at 1:28

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.