Newest 'thompson-sampling' Questions

2 votes

1 answer

223 views

UCB, Thompson sampling etc seems myopic/greedy for bandits?

When considering multi-armed bandits in different formats, UCB, $\epsilon$-greedy, thompson sampling etc seems so greedy/myopic in the sense that it solely considers reward for the current timestep. ...

hugh

53

asked Sep 25, 2023 at 20:12

1 vote

0 answers

49 views

Intuition behind why Posterior Sampling Lemma holds?

Posterior Sampling Lemma was introduced in the (More) Efficient RL via Posterior Sampling and looks like this. $M^*$ here is the true MDP while $M_k$ is the MDP sampled from the posterior in episode $...

pecey

353

asked Sep 24, 2023 at 17:12

2 votes

0 answers

34 views

Minimum sampling for maximising the prediction accuracy

Suppose that I'm training a machine learning model to predict people's age by a picture of their faces. Lets say that I have a dataset of people from 1 year olds to 100 year olds. But I want to choose ...

noone

123

asked Aug 22, 2022 at 13:02

1 vote

0 answers

94 views

Data Imbalance in Contextual Bandit with Thompson Sampling

I'm working with the Online Logistic Regression Algorithm (Algorithm 3) of Chapelle and Li in their paper, "An Empirical Evaluation of Thompson Sampling" (https://papers.nips.cc/paper/2011/...

MABQ

11

asked Jul 1, 2022 at 17:44

4 votes

0 answers

158 views

What is the Thompson Sampling in simple terms?

I am looking at the different existing methods of action selection in reinforcement learning. I found several methods like epsilon-greedy, softmax, upper confidence bound and Thompson sampling. I ...

user14053977

41

asked Feb 2, 2022 at 11:23

0 votes

1 answer

509 views

Why is Thompson Sampling considered a part of Reinforcement Learning?

I often see Thompson Sampling in RL literature, however, I am not able to relate it to any of the current RL techniques. How exactly does it fit with RL?

desert_ranger

682

asked Dec 5, 2021 at 16:43

3 votes

3 answers

1k views

Why aren't exploration techniques, such as UCB or Thompson sampling, used in full RL problems?

Why aren't exploration techniques, such as UCB or Thompson sampling, typically used in bandit problems, used in full RL problems? Monte Carlo Tree Search may use the above-mentioned methods in its ...

Mika

371

asked Nov 15, 2020 at 19:10

1 vote

0 answers

65 views

Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

Agrawal and Goyal (http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf page 3) discussed how we can extend Thompson sampling for bernoulli bandits to Thompson sampling for stochastic bandits in ...

Felix P.

295

asked Nov 8, 2020 at 14:20

1 vote

1 answer

5k views

Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem? [closed]

I ran a test using 3 strategies for multi-armed bandit: UCB, $\epsilon$-greedy, and Thompson sampling. The results for the rewards I got are as follows: Thompson sampling had the highest average ...

Java coder

11

asked Jun 15, 2020 at 22:08

4 votes

2 answers

3k views

Should I use exploration strategy in Policy Gradient algorithms?

In policy gradient algorithms the output is a stochastic policy - a probability for each action. I believe that if I follow the policy (sample an action from the policy) I make use of exploration ...

gnikol

177

asked Jun 6, 2020 at 21:38

8 votes

0 answers

174 views

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

In my implementation of Thompson Sampling (TS) for online Reinforcement Learning, my distribution for selecting $a$ is $\mathcal{N}(Q(s, a), \frac{1}{C(s,a)+1})$, where $C(s,a)$ is the number of times ...

Kevin

81

asked Nov 22, 2019 at 12:03

5 votes

1 answer

833 views

How to compute the action probabilities with Thompson sampling in deep Q-learning?

In some implementations of off-policy Q-learning, we need to know the action probabilities given by the behavior policy $\mu(a)$ (e.g., if we want to use importance sampling). In my case, I am using ...

nicolas

53

asked Jun 15, 2018 at 9:11

Stack Exchange Network

Questions tagged [thompson-sampling]

UCB, Thompson sampling etc seems myopic/greedy for bandits?

Intuition behind why Posterior Sampling Lemma holds?

Minimum sampling for maximising the prediction accuracy

Data Imbalance in Contextual Bandit with Thompson Sampling

What is the Thompson Sampling in simple terms?

Why is Thompson Sampling considered a part of Reinforcement Learning?

Why aren't exploration techniques, such as UCB or Thompson sampling, used in full RL problems?

Multi-armed bandits: reducing stochastic multi-armed bandits to bernoulli bandits

Why am I getting better performance with Thompson sampling than with UCB or $\epsilon$-greedy in a multi-armed bandit problem? [closed]

Should I use exploration strategy in Policy Gradient algorithms?

Normalizing Normal Distributions in Thompson Sampling for online Reinforcement Learning

How to compute the action probabilities with Thompson sampling in deep Q-learning?

Hot Network Questions