Questions tagged [actor-critic-methods]
For questions related to the family of reinforcement learning algorithms denoted by "actor-critic", where there is an actor (a policy) and a critic (a value function).
130 questions
3 votes
1 answer
237 views
Is Clipping Necessary for PPO?
I believe I have a decent understanding of PPO, but I also feel that it could be stated in a simpler, more intuitive way that does not involve the clipping function. That makes me wonder if there is ...
1 vote
1 answer
28 views
How does critic influence actor in "Encoder-Core-Decoder" (in shared and separate network)?
I'm learning RL and understand the basic actor-critic concept, but I'm confused about the technical details of how the critic actually influences the actor during training. Here's my current ...
2 votes
1 answer
109 views
Implementing A3C for CarRacing-v3 continuous action case
The problem I am facing right now is tying the theory from Sutton & Barto about advantage actor critic to the implementation of A3C I read here. From what I understand: The critic network (value ...
0 votes
1 answer
107 views
Convergence results on Tabular Actor-Critic and REINFORCE rmethods?
I wonder what are the theoretical results for convergence of AC and Reinforce methods ? Even in the simplest setup of tabular data - without neural network. Are there such results ? What are the good ...
0 votes
1 answer
73 views
Training Cartpole with actor critic
I used this code to train an actor-critic algorithm to solve Cartpole. Note that this is not one-step actor-critic but a Monte-Carlo AC. What I found is the training process is very unstable although ...
0 votes
0 answers
56 views
Does Vanilla Actor Critic actually work?
I tried to use vanilla AC to solve any of the classic control problems and none of them work. Even for the simple Cartpole environment. I used a shared layer of 128 neuron for both actor and critic ...
1 vote
1 answer
158 views
Are these objective and loss functions from Actor-Critic Methods correct?
I'm doing a research about actor-critic methods and I want to make sure that I understand these methods right. First of all, I understand that as it's a combination of value-based and policy-based ...
0 votes
0 answers
41 views
Model the Policy for policy gradient for the 2D cutting stock problem
I need to implement a policy gradient algorithms (actor-critic) for the 2D cutting stock problem with varied size stocks. However I'm new to machine learning so I still have no clue how to design the ...
3 votes
1 answer
124 views
Is DPG a policy-based method or an actor-critic method?
I have a question about whether the Deterministic Policy Gradient algorithm in it's basic form is policy-based or actor-critic. I have been searching for the answer for a while and in some cases it ...
1 vote
1 answer
184 views
Is this actor-critic algorithm correct?
See slide 92 here: https://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf I did not understand the expression for critic in update of $\Delta \phi$ and also why is it updated at the end of ...
1 vote
0 answers
77 views
Proof for Using Q-Function in Policy Gradient Formula
Currently I am reading the OpenAI spinning up document about policy gradient and actor-critic method. In this webpage,replace Return with Action value, I think they are trying to prove that the ...
0 votes
1 answer
93 views
I have a few doubts understanding and implementing Proximal Policy Optimisation Algorithm [closed]
What is the difference between a rollout buffer and a replay buffer (as used in DQNs). Why can't they be used interchangeably? Why is the trajectory sampling parallelized? Is it just for making data ...
1 vote
1 answer
79 views
Doubt regarding Actor-Critic method
As stated in Sutton and Barto: In REINFORCE with baseline, the learned state-value function estimates the value of the first state of each state transition. This estimate sets a baseline for the ...
2 votes
1 answer
123 views
Actor Critic need to find the goal to have good update and suceed?
Im trying to solve MoutainCar and CarRacing (i love cars) in gym environnement with DDPG and my algo struggle. I have think on why this don't work and I would like to know if my resoning is false. ...
1 vote
1 answer
279 views
DDPG model outputting a fixed action at every timestep
I am trying to create a Car Following model, for which i am using DDPG. My action is acceleration bounded in a range of [-3,3] m/s2. While training the model, for every state it gives a single ...