Questions tagged [actor-critic-methods]

Question 1

I believe I have a decent understanding of PPO, but I also feel that it could be stated in a simpler, more intuitive way that does not involve the clipping function. That makes me wonder if there is ...

Question 2

I'm learning RL and understand the basic actor-critic concept, but I'm confused about the technical details of how the critic actually influences the actor during training. Here's my current ...

Question 3

The problem I am facing right now is tying the theory from Sutton & Barto about advantage actor critic to the implementation of A3C I read here. From what I understand: The critic network (value ...

Question 4

I wonder what are the theoretical results for convergence of AC and Reinforce methods ? Even in the simplest setup of tabular data - without neural network. Are there such results ? What are the good ...

Question 5

I used this code to train an actor-critic algorithm to solve Cartpole. Note that this is not one-step actor-critic but a Monte-Carlo AC. What I found is the training process is very unstable although ...

Question 6

I tried to use vanilla AC to solve any of the classic control problems and none of them work. Even for the simple Cartpole environment. I used a shared layer of 128 neuron for both actor and critic ...

Question 7

I'm doing a research about actor-critic methods and I want to make sure that I understand these methods right. First of all, I understand that as it's a combination of value-based and policy-based ...

Question 8

I need to implement a policy gradient algorithms (actor-critic) for the 2D cutting stock problem with varied size stocks. However I'm new to machine learning so I still have no clue how to design the ...

Question 9

I have a question about whether the Deterministic Policy Gradient algorithm in it's basic form is policy-based or actor-critic. I have been searching for the answer for a while and in some cases it ...

Question 10

See slide 92 here: https://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf I did not understand the expression for critic in update of $\Delta \phi$ and also why is it updated at the end of ...

Question 11

Currently I am reading the OpenAI spinning up document about policy gradient and actor-critic method. In this webpage,replace Return with Action value, I think they are trying to prove that the ...

Question 12

What is the difference between a rollout buffer and a replay buffer (as used in DQNs). Why can't they be used interchangeably? Why is the trajectory sampling parallelized? Is it just for making data ...

Question 13

As stated in Sutton and Barto: In REINFORCE with baseline, the learned state-value function estimates the value of the first state of each state transition. This estimate sets a baseline for the ...

Question 14

Im trying to solve MoutainCar and CarRacing (i love cars) in gym environnement with DDPG and my algo struggle. I have think on why this don't work and I would like to know if my resoning is false. ...

Question 15

I am trying to create a Car Following model, for which i am using DDPG. My action is acceleration bounded in a range of [-3,3] m/s2. While training the model, for every state it gives a single ...

Stack Exchange Network

Questions tagged [actor-critic-methods]

Is Clipping Necessary for PPO?

How does critic influence actor in "Encoder-Core-Decoder" (in shared and separate network)?

Implementing A3C for CarRacing-v3 continuous action case

Convergence results on Tabular Actor-Critic and REINFORCE rmethods?

Training Cartpole with actor critic

Does Vanilla Actor Critic actually work?

Are these objective and loss functions from Actor-Critic Methods correct?

Model the Policy for policy gradient for the 2D cutting stock problem

Is DPG a policy-based method or an actor-critic method?

Is this actor-critic algorithm correct?

Proof for Using Q-Function in Policy Gradient Formula

I have a few doubts understanding and implementing Proximal Policy Optimisation Algorithm [closed]

Doubt regarding Actor-Critic method

Actor Critic need to find the goal to have good update and suceed?

DDPG model outputting a fixed action at every timestep

Hot Network Questions