Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

2
  • $\begingroup$ Thank you for your feedback. I wanted to make an introduction to AC methods, in which I would mention the intoduction of the loss function to critic (having previously defined value-based and policy-based methods and objective function). Then I was thinking about defining the methods, the first being A2C, and their functions so then I would continue with TRPO, PPO and SAC among others. However, right now I wanted to focus on A2C. You said that I did not mention the advantage for the actor network but I included it in the fourth equation. $\endgroup$ Commented Dec 7, 2024 at 23:29
  • $\begingroup$ Your plan to introduce A2C in the context of actor-critic methods sounds well-structured and pedagogically sound. In addition to the clarification of my above answer, also you may further ponder about why off-policy SAC only needs one critic target network while DDPG needs target networks for both critic and actor. Hope this clarifies and helpful to your concerned question here. $\endgroup$ Commented Dec 8, 2024 at 3:48