Skip to main content
1 of 7
nedward
  • 414
  • 5
  • 13

Why does Q Learning diverge?

My Q-Learning algorithm's state values keep on diverging to infinity, which means my weights are diverging too. I use a neural network for my value-mapping.

I've tried:

  • Clipping the "reward + discount * maximum value of action" (max/min set to 50/-50)
  • Setting a low learning rate (0.00001 and I use the classic Backpropagation for updating the weights)
  • Decreasing the values of the rewards
  • Increasing the exploration rate
  • Normalizing the inputs to between 1~100 (previously it was 0~1)
  • Change the discount rate
  • Decrease the layers of the neural network (just for validation)

I've heard that Q Learning is known to diverge on non-linear input, but are there anything else that I can try to stop the divergence of the weights?

nedward
  • 414
  • 5
  • 13