Skip to main content

Questions tagged [actor-critic-methods]

For questions related to the family of reinforcement learning algorithms denoted by "actor-critic", where there is an actor (a policy) and a critic (a value function).

3 votes
1 answer
237 views

I believe I have a decent understanding of PPO, but I also feel that it could be stated in a simpler, more intuitive way that does not involve the clipping function. That makes me wonder if there is ...
Beane's user avatar
  • 152
1 vote
1 answer
28 views

I'm learning RL and understand the basic actor-critic concept, but I'm confused about the technical details of how the critic actually influences the actor during training. Here's my current ...
Reyomi's user avatar
  • 131
2 votes
1 answer
109 views

The problem I am facing right now is tying the theory from Sutton & Barto about advantage actor critic to the implementation of A3C I read here. From what I understand: The critic network (value ...
DeadAsDuck's user avatar
0 votes
1 answer
107 views

I wonder what are the theoretical results for convergence of AC and Reinforce methods ? Even in the simplest setup of tabular data - without neural network. Are there such results ? What are the good ...
Alexander Chervov's user avatar
0 votes
1 answer
73 views

I used this code to train an actor-critic algorithm to solve Cartpole. Note that this is not one-step actor-critic but a Monte-Carlo AC. What I found is the training process is very unstable although ...
Leafstar's user avatar
0 votes
0 answers
56 views

I tried to use vanilla AC to solve any of the classic control problems and none of them work. Even for the simple Cartpole environment. I used a shared layer of 128 neuron for both actor and critic ...
Leafstar's user avatar
1 vote
1 answer
158 views

I'm doing a research about actor-critic methods and I want to make sure that I understand these methods right. First of all, I understand that as it's a combination of value-based and policy-based ...
marc_spector's user avatar
0 votes
0 answers
41 views

I need to implement a policy gradient algorithms (actor-critic) for the 2D cutting stock problem with varied size stocks. However I'm new to machine learning so I still have no clue how to design the ...
Phạm Trần Minh Trí's user avatar
3 votes
1 answer
124 views

I have a question about whether the Deterministic Policy Gradient algorithm in it's basic form is policy-based or actor-critic. I have been searching for the answer for a while and in some cases it ...
marc_spector's user avatar
1 vote
1 answer
184 views

See slide 92 here: https://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture14.pdf I did not understand the expression for critic in update of $\Delta \phi$ and also why is it updated at the end of ...
DSPinfinity's user avatar
  • 1,223
1 vote
0 answers
77 views

Currently I am reading the OpenAI spinning up document about policy gradient and actor-critic method. In this webpage,replace Return with Action value, I think they are trying to prove that the ...
jim1124's user avatar
  • 13
0 votes
1 answer
93 views

What is the difference between a rollout buffer and a replay buffer (as used in DQNs). Why can't they be used interchangeably? Why is the trajectory sampling parallelized? Is it just for making data ...
DeadAsDuck's user avatar
1 vote
1 answer
79 views

As stated in Sutton and Barto: In REINFORCE with baseline, the learned state-value function estimates the value of the first state of each state transition. This estimate sets a baseline for the ...
DeadAsDuck's user avatar
2 votes
1 answer
123 views

Im trying to solve MoutainCar and CarRacing (i love cars) in gym environnement with DDPG and my algo struggle. I have think on why this don't work and I would like to know if my resoning is false. ...
Cauchy_Chlasse's user avatar
1 vote
1 answer
279 views

I am trying to create a Car Following model, for which i am using DDPG. My action is acceleration bounded in a range of [-3,3] m/s2. While training the model, for every state it gives a single ...
Aditya Mishra's user avatar

15 30 50 per page
1
2 3 4 5
9