Skip to main content
2 of 5
edited tags
Simon Larsson
  • 4.3k
  • 1
  • 16
  • 30

Deep Q-Learning for physical quantity: q-values distribution not as expected

Setting

I am trying to learn a specific physical quantity (radiance) inside a 3D scene with Deep Q-Learning. Just to give a quick overview, my agent shoots rays inside the scene: the reward is the irradiance of the points hit. This means that the reward is given only when a light source is hit - only 1% of the times. This lead to a very sparse reward function.

My state is a tuple of the 3D spatial coordinates inside the scene, my actions are the possible discrete directions used by the agent to scatter rays. The q-values represent this physical quantity based on that specific action(/direction).

Problem

I expect the q-values to be higher in the first 10 actions, and then slightly decrease. This would reflect the physics of my system. When the training starts, this is actually the case:

enter image description here

After some episodes, the q-value of one action starts to spike, as seen in the figure below. This does not reflect the physics of the environment, for which the incoming radiance should be distributed over all the actions.

enter image description here

I know about target networks and experience replay (uniform and PER), but this did not solve my issue.
Are there any methods that I can try to deal with this problem? Thanks

maurock
  • 284
  • 3
  • 11