Newest 'actor-critic' Questions

1 vote

0 answers

54 views

Actor-Critic one step TD update rule

In Sutton & Barto's book (Chapter $13$), it is stated that the update rule in REINFORCE could be reformated as \begin{equation} \begin{split} \theta_{t+1} &=\theta_t+\alpha\left(G_{t:t+1}-\hat{...

Hadar

167

asked Dec 2, 2022 at 13:47

1 vote

0 answers

97 views

What is the meaning about the $\alpha$ in TD3 algorithm

I am study the paper with TD3 algorithm. I am curious about the meaning of $\alpha$ while the paper prove that overestimation will be happened in a critical situation. The contents about mathematical ...

jackson

25

asked Sep 14, 2022 at 16:37

0 votes

1 answer

440 views

Actor Network Target Value in A2C Reinforcement Learning

In DQN, we use; $Target = r+\gamma v(s')$ equation to train (fit) our network. It is easy to understand since we use the $Target$ value as the dependent variable like we do in supervised learning. I....

datatech

53

asked Apr 15, 2021 at 16:43

1 vote

0 answers

175 views

A2C learning very slowly when I try to make it learn on batches as compared to making it learn on each step

I tried this on openai gym environment - LunarLander-v2. I wrote two algorithms with just one difference: Made it learn on each step. Made it learn at the end of each episode. There is a significant ...

starlord

11

asked Jun 14, 2020 at 12:58

1 vote

0 answers

49 views

Action selection in actor-critic algorithm:

I have an action space that is just a list of values given by acts = [i for i in range(10, 100, 10)]. According to pytorch documentary, the loss is calculated as below. Could someone explain to me how ...

EArwa

75

asked Mar 30, 2020 at 12:57

1 vote

0 answers

110 views

Rewards are converged but with a lot of variations

I am training a reinforcement learning agent on an episodic task of fixed episode length. I am tracking the training process by plotting the cumulative rewards over an episode. I am using tensorboard ...

chink

565

asked Nov 29, 2019 at 10:28

1 vote

1 answer

98 views

Formulation of a reward structure

I am new to reinforcement learning and experimenting with training of RL agents. I have a doubt about reward formulation, from a given state if a agent takes a good action i give a positive reward, ...

chink

565

asked Nov 26, 2019 at 10:26

0 votes

1 answer

335 views

How to handle differences between training and deploying of an RL agent

Hi I am training an RL agent for a control problem. The objective of the agent is to maintain temperature in a zone. It is an episodic task with episode length of 10 hrs and actions being taken every ...

chink

565

asked Nov 18, 2019 at 7:16

Stack Exchange Network

Questions tagged [actor-critic]

Actor-Critic one step TD update rule

What is the meaning about the $\alpha$ in TD3 algorithm

Actor Network Target Value in A2C Reinforcement Learning

A2C learning very slowly when I try to make it learn on batches as compared to making it learn on each step

Action selection in actor-critic algorithm:

Rewards are converged but with a lot of variations

Formulation of a reward structure

How to handle differences between training and deploying of an RL agent

Hot Network Questions