Skip to main content
added the reason's that answer the question & reference it.
Source Link
Anu
  • 673
  • 8
  • 17

Please refer theseHere are some of the things you could do:

  • When using SoftMax cross entropy function:

the SoftMax numerator should never have zero-values due to the exponential. However, due to floating point precision, the numerator could be a very small value, say, exp(-50000), which essentially evaluates to zero.(ref.)

  • Quick fixes could be to either increase the precision of your model (using 64-bit floats instead of, presumably, 32 bit floats), or just introduce a function that caps your values, so anything below zero or exactly zero is just made to be close enough to zero that the computer doesn't freak out. For example, use X = np.log(np.max(x, 1e-9)) before going into the softmax.(ref.)
  • You can use methods like "FastNorm" which improves numerical stability and reduces accuracy variance enabling higher learning rate and offering better convergence.(ref.)

  • Check weights initialization: If unsure, use Xavier or He initialization. Also, your initialization might be leading you to a bad local minimum, so try a different initialization and see if it helps.

  • Decrease the learning rate, especially if you are getting NaNs in the first 100 iterations.

  • NaNs can arise from division by zero or natural log of zero or negative number.

  • Try evaluating your network layer by layer and see where the NaNs appear.

some of the suggestions were taken from the references from the two great posts on StackOverflow & on KDnuggests

Please refer these two great posts on StackOverflow & on KDnuggests

Here are some of the things you could do:

  • When using SoftMax cross entropy function:

the SoftMax numerator should never have zero-values due to the exponential. However, due to floating point precision, the numerator could be a very small value, say, exp(-50000), which essentially evaluates to zero.(ref.)

  • Quick fixes could be to either increase the precision of your model (using 64-bit floats instead of, presumably, 32 bit floats), or just introduce a function that caps your values, so anything below zero or exactly zero is just made to be close enough to zero that the computer doesn't freak out. For example, use X = np.log(np.max(x, 1e-9)) before going into the softmax.(ref.)
  • You can use methods like "FastNorm" which improves numerical stability and reduces accuracy variance enabling higher learning rate and offering better convergence.(ref.)

  • Check weights initialization: If unsure, use Xavier or He initialization. Also, your initialization might be leading you to a bad local minimum, so try a different initialization and see if it helps.

  • Decrease the learning rate, especially if you are getting NaNs in the first 100 iterations.

  • NaNs can arise from division by zero or natural log of zero or negative number.

  • Try evaluating your network layer by layer and see where the NaNs appear.

some of the suggestions were taken from the references from the two great posts on StackOverflow & on KDnuggests

Source Link
Anu
  • 673
  • 8
  • 17

Please refer these two great posts on StackOverflow & on KDnuggests