I'm studying "Deep Reinforcement Learning" and build my own example after pytorch's REINFORCEMENT LEARNING (DQN) TUTORIAL.
I'm implement actor's strategy as follows: 1. model.eval() 2. get best action from a model 3. self.net.train()
The question is: Does going back and forth between eval() and train() modes cause any damage to optimization process?
The model includes only Linear and BatchNorm1d layers. As far as I know when using BatchNorm1d one must perform model.eval() to use a model, because there is different results in eval() and train() modes.
When training Classification Neural Network the model.eval() performed only after training is finished, but in case of "Deep Reinforcement Learning" it is usual to use strategy and then continue the optimization process.
I'm wondering if going back and forth between modes is "harmless" to optimization process?
def strategy(self, state): # Explore or Exploit if self.epsilon > random(): action = choice(self.actions) else: self.net.eval() action = self.net(state.unsqueeze(0)).max(1)[1].detach() self.net.train()