Questions tagged [actor-critic]
The actor-critic tag has no summary.
28 questions
1 vote
0 answers
54 views
Actor-Critic one step TD update rule
In Sutton & Barto's book (Chapter $13$), it is stated that the update rule in REINFORCE could be reformated as \begin{equation} \begin{split} \theta_{t+1} &=\theta_t+\alpha\left(G_{t:t+1}-\hat{...
1 vote
0 answers
97 views
What is the meaning about the $\alpha$ in TD3 algorithm
I am study the paper with TD3 algorithm. I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation. The contents about mathematical ...
0 votes
1 answer
440 views
Actor Network Target Value in A2C Reinforcement Learning
In DQN, we use; $Target = r+\gamma v(s')$ equation to train (fit) our network. It is easy to understand since we use the $Target$ value as the dependent variable like we do in supervised learning. I....
1 vote
0 answers
175 views
A2C learning very slowly when I try to make it learn on batches as compared to making it learn on each step
I tried this on openai gym environment - LunarLander-v2. I wrote two algorithms with just one difference: Made it learn on each step. Made it learn at the end of each episode. There is a significant ...
1 vote
0 answers
49 views
Action selection in actor-critic algorithm:
I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how ...
1 vote
0 answers
110 views
Rewards are converged but with a lot of variations
I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...
1 vote
1 answer
98 views
Formulation of a reward structure
I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...
0 votes
1 answer
335 views
How to handle differences between training and deploying of an RL agent
Hi I am training an RL agent for a control problem. The objective of the agent is to maintain temperature in a zone. It is an episodic task with episode length of 10 hrs and actions being taken every ...