Questions tagged [reinforcement-learning]
The reinforcement-learning tag has no summary.
96 questions
2 votes
2 answers
267 views
Why does it make sense to minimize regret?
In fields such as game theory and reinforcement learning, it is standard to consider the regret-minimization strategy. I don't get the motivation for the definition. Yes, doing your best under worst-...
1 vote
0 answers
59 views
What is the design concept behind DeepSeek-R1
I think R1 was born as an exploration of the AGI approach, and the design of the whole scheme is in line with Professor Sutton's philosophy of "searching + learning to find a meta-approach that ...
1 vote
1 answer
76 views
In multi-agent RL stochastic games (multi-agent version of an MDP), why does the Nash equilibrium policy only depend on the state, not history?
I am studying from the MARL textbook by Albrecht, Christianos and Schäfer. They define a stochastic game in Sec 3.3 as the multi-agent version of an MDP. In Fig 3.3 (pg 50) they give an intuition for ...
1 vote
1 answer
60 views
In reinforcement learning, does policy affect the maximization of the value?
Though the reward was assigned by the environment, the once the policy $\pi$ was fixed, the probability of the action on the states $\pi(a|s)$ could be assigned. However, this meant given different ...
0 votes
1 answer
146 views
Main subjects to learn Artificial Intelligence in CS
In my PhD, I will work with ML models. However, I will only use ready-made models as a tool, but I want to delve deeper into Artificial Intelligence not just to use ready-made models, but to ...
1 vote
0 answers
140 views
DeepMind Alphadev: How did it use Reinforcement Learning to reduce the search space?
Google DeepMind recently published a new paper which describes how they used a reinforcement learning to discover faster sorting algorithms. A summary of the paper is here and the paper is here. It ...
0 votes
2 answers
163 views
Multi-Armed Bandit - Reward Probabilities
I am new to reinforcement learning, and recently came across the following issue. When implementing a multi-armed bandit algorithm, we assume we have k machines with reward probabilities [p_1,..., p_k]...
1 vote
0 answers
164 views
Tabular Meta-Learning in RL
There are various meta-learning algorithms in RL that are proposed for settings when we have a (deep) neural network and the policy (or the value function) are parameterized as such. Can these methods ...
0 votes
1 answer
119 views
Reinforcement Learning Reward Function for Optimizing Golf Aim?
I read this article, mentioning that either here, or StackOverflow would be the best places to ask generic machine learning questions, however, if the question isn't programming specific with a ...
1 vote
0 answers
80 views
Long and short memory in reinforcement learning Connect 4 AI
I'm writing an AI based on reinforcement learning to play Connect 4. That's my second bot and attempt to RNN and AI (first was copy a code of snake RNN AI from youtube) and I'm looking for some ...
1 vote
0 answers
38 views
Evaluating the safety of deep RL algorithms
Is there any free/open-source environment, tasks, or dataset for evaluating deep RL algorithms in terms of safety? all available environments (like openAI's) are environments for simple games. These ...
1 vote
0 answers
62 views
Deep RL for healthcare: existing benchmarking datasets or environments
I am currently a Ph.D. student in the computer science department, I was given the subject of Deep RL for Healthcare. However, after lots of research on the internet, I could not find any free dataset ...
2 votes
0 answers
55 views
Where to find the current state of the art performance of Deep RL algorithms?
Recently, I had an idea of a novel Deep RL algorithm that might perform better than existing algorithms such as DQN, TRPO, PPO, etc. However, I do not know of a website or a research paper that might ...
2 votes
1 answer
108 views
What makes Deep RL "fundamentally/mathematically" advantageous?
Note: I consider myself to be a beginner in the field of Deep RL. Deep RL has proven tremendous success in recent years like playing atari and beating go champion. Therefore, considerable interest for ...
4 votes
2 answers
2k views
How to setup the Bellman Equation as a linear system of equation
I was watching a video on Reinforcement Learning by Andrew Ng, and at about minute 23 of the video he mentions that we can represent the Bellman equation as a linear system of equations. I am talking ...