Return to Revisions

1 of 7

asked Aug 11, 2017 at 1:11

Why does Q Learning diverge?

My Q-Learning algorithm's state values keep on diverging to infinity, which means my weights are diverging too. I use a neural network for my value-mapping.

I've tried:

Clipping the "reward + discount * maximum value of action" (max/min set to 50/-50)
Setting a low learning rate (0.00001 and I use the classic Backpropagation for updating the weights)
Decreasing the values of the rewards
Increasing the exploration rate
Normalizing the inputs to between 1~100 (previously it was 0~1)
Change the discount rate
Decrease the layers of the neural network (just for validation)

I've heard that Q Learning is known to diverge on non-linear input, but are there anything else that I can try to stop the divergence of the weights?

asked Aug 11, 2017 at 1:11

nedward