Skip to main content

Questions tagged [actor-critic]

1 vote
0 answers
54 views

In Sutton & Barto's book (Chapter $13$), it is stated that the update rule in REINFORCE could be reformated as \begin{equation} \begin{split} \theta_{t+1} &=\theta_t+\alpha\left(G_{t:t+1}-\hat{...
Hadar's user avatar
  • 167
1 vote
0 answers
97 views

I am study the paper with TD3 algorithm. I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation. The contents about mathematical ...
jackson's user avatar
  • 25
0 votes
1 answer
440 views

In DQN, we use; $Target = r+\gamma v(s')$ equation to train (fit) our network. It is easy to understand since we use the $Target$ value as the dependent variable like we do in supervised learning. I....
datatech's user avatar
1 vote
0 answers
175 views

I tried this on openai gym environment - LunarLander-v2. I wrote two algorithms with just one difference: Made it learn on each step. Made it learn at the end of each episode. There is a significant ...
starlord's user avatar
1 vote
0 answers
49 views

I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how ...
EArwa's user avatar
  • 75
1 vote
0 answers
110 views

I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...
chink's user avatar
  • 565
1 vote
1 answer
98 views

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...
chink's user avatar
  • 565
0 votes
1 answer
335 views

Hi I am training an RL agent for a control problem. The objective of the agent is to maintain temperature in a zone. It is an episodic task with episode length of 10 hrs and actions being taken every ...
chink's user avatar
  • 565

15 30 50 per page