Maximum entropy principle for Poisson distribution

Question

I know that certain probability distributions may be derived from the requirement that entropy be maximal along with a constraint such as fixed variance. In the case of fixed variance, for example, one finds the normal distribution. In particular, the maximisation is over the set of all (!) continuous PDFs with that fixed variance.

Now my question is, is there a similarly general derivation of the Poisson distribution as a maximum entropy distribution? E.g. fixing that mean and variance are equal and maximising entropy? I have found a couple of articles but they always seem to prove maximality on a restricted set of discrete PDFs. Is it because there is no more general maximum entropy principle for the Poisson distribution? If so, is it because the discrete case is simply more complex than the continuous one?

The form of the maximum entropy distribution for the discrete case is provided in the wikipedia article. It does not appear that this form corresponds to the Poisson distribution for some set of constraints (see the Table of the above link). — Stelios
– Stelios, Commented Apr 19, 2017 at 11:57
Cf. here: arxiv.org/pdf/math/0603647v2.pdf and here: yaroslavvb.com/papers/harremoes-binomial.pdf. In both cases, the authors consider a subset of all discrete PDFs. — Cyclone
– Cyclone, Commented Apr 19, 2017 at 19:24
I thought you were asking about the entropy maximizing distribution out of all discrete PDFs. Clearly, if we restrict our search to a subset of PDFs, the Poisson may indeed be the entropy maximizer. As a trivial example, consider the maximizer from the set of two PDFs: (1) $\mathbb{P}(x=k)=e^{-\lambda}\frac{\lambda^k}{k!}$ (Poisson), and (2) $\mathbb{P}(x=0)=1, \mathbb{P}(x=l)=0, x\neq0$ (certain event). Of course, it is of (theoretical?) interest to find non-trivial sets of distributions for which the Poisson is the maximizer, and that's what these references are addressing. — Stelios
– Stelios, Commented Apr 19, 2017 at 19:41
Yes you understood my question correctly. Maybe I misunderstand your number (2), but it doesn't look like a PDF to me if it is only non-zero and equal to one at x=0 (not normalised). OK, so you're saying you believe the Poisson or Bernoulli distribution do not have a maximum entropy principle such as the normal distribution. In a way, that seems to indicate for me that they are more complicated things than for example the normal distribution. Why is it that they are more complicated? Is it just that discrete distributions are more tricky than continuous ones? — Cyclone
– Cyclone, Commented Apr 20, 2017 at 8:15
Apologies, it should be $l\neq 0$ (instead of $x\neq 0$) in (2). This is a perfectly valid PDF (although, trivial). However, you can easily replace (2) by some other PDF that has a smaller entropy that the Poisson. I would not say that the discrete distribution space is more complicated, since the maximum entropy distributions are obtained with the same methodology as in the continuous case. It only happens that the result is not Poisson (and I don't see why one should expect the Poisson dist. to be the entropy maximizer). — Stelios
– Stelios, Commented Apr 20, 2017 at 8:38

Wenli5 · Accepted Answer · 2019-06-25 00:24:50Z

I believe the second paper you cited (by Harremoës) is actually the answer you're looking for. The Poisson distribution describes the number of occurrences of an event in a fixed interval, under the assumption that occurrences are independent. In particular, the constraint that the events should be independent means that not every discrete distribution is a valid candidate for describing this system, and motivates the choice of the union of infinite Bernoulli variables. Then, Harremoës shows that if you further constrain the expected value (i.e., $\lambda$), then the maximum entropy distribution is the Poisson distribution.

So, the Poisson distribution is the maximum entropy distribution given constraints of counting independent events and having a known expected value.

That said, you can also easily reverse-engineer a (contrived) constraint for which the Poisson distribution would be the maximum entropy distribution.

Let our unknown constraint be $\mathbb{E}[f(k)] = c$. Maximizing the entropy with this constraint, along with the mean being $\lambda$, gives the minimization problem

$\sum_k p(k) \ln p(k) - \alpha \left( \sum_k p(k) - 1\right) - \beta\left(\sum_k k p(k) - \lambda\right) - \gamma \left( \sum_k p(k)f(k) - c \right)$,

where $\alpha$, $\beta$, and $\gamma$ are Lagrange multipliers. Taking the derivative with respect to $p(k)$ yields

$\ln p(k) = -1 + \alpha + \beta k + \gamma f(k)$,

We already know the Poisson distribution has the form $p(k) = e^{-\lambda}\lambda^k/k!$, or $\ln(p(k)) = -\lambda + k \ln(\lambda) - \ln(k!)$. Therefore, we can guess that $f(k)$ has the functional form $\ln(k!)$.

So, the Poisson distribution maximizes entropy when $p$ has mean $\lambda$ and $\mathbb{E}(\ln k!) = $[some particular value depending on $\lambda$].

This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$. The Johnson paper you cited is (in my opinion) similarly unsatisfying, since it essentially proves that the Poisson distribution is the maximal entropy distribution among distributions which are "more log-convex than the Poisson distribution".

Thanks, that's interesting! Just a comment for my own understanding: in order to actually obtain the Poisson distribution, we need $-\lambda = -1 + \alpha$, $\ln(\lambda)=\beta$ and $\gamma =-1$. If I assume $\gamma=-1$, then the other two follow from the other two constraints. I don't think one can explicitly sum the 3rd constraint given $f(k)=\ln(k)$ for general $\gamma$. However, it is not hard to see that $g(\gamma)=\sum_k p(k)f(k)=\sum e^{\alpha-1} e^{\beta k}/k!^{-\gamma}$ is strictly increasing in $\gamma$ and $g(0)=\infty$, $g(-\infty)=0$, thus we can choose $c$ to enforce $\gamma=-1$. — Cyclone
– Cyclone, Commented Jul 19, 2019 at 22:24
"This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$" is not true in some sense — Fei Cao
– Fei Cao, Commented Jun 7, 2022 at 4:53

André Moreira · Accepted Answer · 2020-12-29 09:53:34Z

I guess the $k!$ comes from a permutation term.

Supose a composite system. Subsystem A has $n_a$ elements, while Subsystem B has $n_b$ elements. There are ${n_a+n_b}\choose{n_a}$ possible permutations. Consider that elements in A may be in $m_a$ states, while elements in B may be in $m_b$ states. You may think that each subsystem is a bag with a given number of pockets. There will be $$ {{n_a+n_b}\choose{n_a}} m_a^{n_a} m_b^{n_b} $$ different states for the composite system, given the partion of the elements. Considering all states have the same probability, the probability of having a patition with $n_a$ elements in A is proportional to the number of possible states for the composite system. Using the binomial theorem we have that the probability of a given partion is

$ P(n_a)={{n_a+n_b}\choose{n_a}} m_a^{n_a} m_b^{n_b}/(m_a+m_b)^{(n_a+n_b)} $

Assuming that $m_b$ grows to infinity, while $(n_a+n_b)/m_b=\mu$ remains constant, yields Poisson $$ P(n_a)=e^{-\mu}\mu^{n_b}/n_a! $$

Stack Exchange Network

Maximum entropy principle for Poisson distribution

2 Answers 2

You must log in to answer this question.

Hot Network Questions

Maximum entropy principle for Poisson distribution

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions