Catastrophic forgetting in linear semi-gradient RL agent?

Question

I've been working through the Sutton + Barto RL text, implementing a number of the algos + running them in the OpenAI gym. One phenomenon that I seem to come across quite regularly is that agents who, at certain points during their training appear to be making good progress towards learning a plausible state-value/state-action functions, "catastrophically forget" the insights they glean and subsequently never recover.

To make this more concrete, here's the (smoothed) reward history of my implementation of a semi-gradient expected SARSA agent with linear function approximation and binary features running on the MountainCar environment.

Problem Details

The problem has a bounded, continuous state space and a discrete, 3-action action space.
The learner receives a reward of $-1$ at each timestep. The episode ends when the car makes its way all the way up the hill OR it reaches 200 timesteps without success.
Prior to learning, I tile-coded the state space using an 8x8 grid with 8 overlapping tilings, generated using Sutton's own tiles.py script.

Learner Details

The learner is an implementation of the semi-gradient SARSA algorithm with linear function approximation for the Q value.
The agent begins with all Q weights initialized to 0.
The learning rate for the agent was set at $\frac{1}{10} \times (\text{# tiles})^{-1}$, where $\text{# tiles}$ in this case was $8 \times 8 \times 8 = 512$
During learning, the agent selected its actions using an $\epsilon$-soft policy, where $\epsilon$ is set to 0.10
The agent's temporal discount factor is set to $\gamma = 0.95$ (though I realize it could have been 1 given the episodic nature of the task)

My Question

One possibility I want to rule out is that my implementation of the learning agent is incorrect. To that end I am curious to know whether others experience this kind of "forgetting" behavior (even on simple problems like this), and if so, how it might be reduced.

$\begingroup$ @StuBernis I am interested in studying your code $\endgroup$

Hermes Morales
– Hermes Morales

2021-04-17 21:09:08 +00:00
Commented Apr 17, 2021 at 21:09 — Hermes Morales
– Hermes Morales, Commented Apr 17, 2021 at 21:09

Nestor Sanchez · Accepted Answer · 2018-04-24 22:59:49Z

The same thing was happening to me with a deep Q Network on the cart-pole problem. Having a "memory" with past (S,A,R,S) sequences and sampling it to form mini batches with the new observations helped a lot to reduce catastrophic forgetting. Reducing the step size once the agent has improved a certain amount also helped.

Stack Exchange Network

Catastrophic forgetting in linear semi-gradient RL agent?

1 Answer 1

Hot Network Questions

Catastrophic forgetting in linear semi-gradient RL agent?

1 Answer 1

Related

Hot Network Questions