Skip to main content
added 187 characters in body
Source Link
AndreaL
  • 541
  • 2
  • 6

You might want to have a look at the Wallis derivation.

https://en.wikipedia.org/wiki/Principle_of_maximum_entropy#The_Wallis_derivation

It has the advantage of being strictly combinatorial in nature, making no reference to information entropy as a measure of 'uncertainty', 'uninformativeness', or any other imprecisely defined concept.

The wikipedia page is excellent, but let me add a simple example to illustrate the idea.

Suppose you have a dice. If the dice is fair, the average value of the number shown will be 3.5. Now, imagine to have a dice for which the average value shown is 5a bit higher, let's say 4.

How can it achievedo that? Well, it could do it in zillion ways! It could for example show 54 every single time. Or it could show 3, 4, 5, 6 with equal probability.

But anLet's say you want to write a computer program that simulates a dice with average 4. How would you do it?

An interesting way of doing could besolution is this. Imagine to haveYou start with a fair dice. You roll it many times (say 100) and you get a bunch of numbers. If the average of these numbers is 54, you accept the sample. Otherwise you reject it and try again.

After many many attempts, you finally get a sample with the average that you want4. Now your computer program will simply return a number randomly chosen from this sample.

Which numbers are in that samplewill it show? Well, for example, you expect that 1 for example will to be present a little bit, but probably not 1/6 of the times, because a 1 will lower that average of the sample and it will increase the probability of the sample to be rejected.

In the limit of a very big sample, the numbers will be distributed according to this:

https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Discrete_distributions_with_specified_mean

which is the distribution with maximum entropy among the onceones with specified mean. Ah-ahAha!

You might want to have a look at the Wallis derivation.

https://en.wikipedia.org/wiki/Principle_of_maximum_entropy#The_Wallis_derivation

It has the advantage of being strictly combinatorial in nature, making no reference to information entropy as a measure of 'uncertainty', 'uninformativeness', or any other imprecisely defined concept.

The wikipedia page is excellent, but let me add a simple example to illustrate the idea.

Suppose you have a dice. If the dice is fair, the average value of the number shown will be 3.5. Now, imagine to have a dice for which the average value shown is 5.

How can it achieve that? Well, it could do it in zillion ways! It could for example show 5 every time. Or it could show 4, 5, 6 with equal probability.

But an interesting way of doing could be this. Imagine to have a fair dice. You roll it many times (say 100) and you get a bunch of numbers. If the average of these numbers is 5, you accept the sample. Otherwise you reject it and try again.

After many attempts, you finally get a sample with the average that you want. Which numbers are in that sample? Well, you expect that 1 for example will be present a little bit, but probably not 1/6 of the times, because a 1 will lower that average of the sample and it will increase the probability of the sample to be rejected.

In the limit of a very big sample, the numbers will be distributed according to this:

https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Discrete_distributions_with_specified_mean

which is the distribution with maximum entropy among the once with specified mean. Ah-ah!

You might want to have a look at the Wallis derivation.

https://en.wikipedia.org/wiki/Principle_of_maximum_entropy#The_Wallis_derivation

It has the advantage of being strictly combinatorial in nature, making no reference to information entropy as a measure of 'uncertainty', 'uninformativeness', or any other imprecisely defined concept.

The wikipedia page is excellent, but let me add a simple example to illustrate the idea.

Suppose you have a dice. If the dice is fair, the average value of the number shown will be 3.5. Now, imagine to have a dice for which the average value shown is a bit higher, let's say 4.

How can it do that? Well, it could do it in zillion ways! It could for example show 4 every single time. Or it could show 3, 4, 5 with equal probability.

Let's say you want to write a computer program that simulates a dice with average 4. How would you do it?

An interesting solution is this. You start with a fair dice. You roll it many times (say 100) and you get a bunch of numbers. If the average of these numbers is 4, you accept the sample. Otherwise you reject it and try again.

After many many attempts, you finally get a sample with average 4. Now your computer program will simply return a number randomly chosen from this sample.

Which numbers will it show? Well, for example, you expect 1 to be present a little bit, but probably not 1/6 of the times, because a 1 will lower that average of the sample and it will increase the probability of the sample to be rejected.

In the limit of a very big sample, the numbers will be distributed according to this:

https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Discrete_distributions_with_specified_mean

which is the distribution with maximum entropy among the ones with specified mean. Aha!

Source Link
AndreaL
  • 541
  • 2
  • 6

You might want to have a look at the Wallis derivation.

https://en.wikipedia.org/wiki/Principle_of_maximum_entropy#The_Wallis_derivation

It has the advantage of being strictly combinatorial in nature, making no reference to information entropy as a measure of 'uncertainty', 'uninformativeness', or any other imprecisely defined concept.

The wikipedia page is excellent, but let me add a simple example to illustrate the idea.

Suppose you have a dice. If the dice is fair, the average value of the number shown will be 3.5. Now, imagine to have a dice for which the average value shown is 5.

How can it achieve that? Well, it could do it in zillion ways! It could for example show 5 every time. Or it could show 4, 5, 6 with equal probability.

But an interesting way of doing could be this. Imagine to have a fair dice. You roll it many times (say 100) and you get a bunch of numbers. If the average of these numbers is 5, you accept the sample. Otherwise you reject it and try again.

After many attempts, you finally get a sample with the average that you want. Which numbers are in that sample? Well, you expect that 1 for example will be present a little bit, but probably not 1/6 of the times, because a 1 will lower that average of the sample and it will increase the probability of the sample to be rejected.

In the limit of a very big sample, the numbers will be distributed according to this:

https://en.wikipedia.org/wiki/Maximum_entropy_probability_distribution#Discrete_distributions_with_specified_mean

which is the distribution with maximum entropy among the once with specified mean. Ah-ah!