Skip to main content
Fix a [math processing error] when displayed.
Source Link

I'll start with a bit of theory before trying to describe intuitively why the probability density $\frac{t^n}{n!}e^{-t}$$\dfrac{t^n}{n!}e^{-t}$ pops out.

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\frac{t^n}{n!}e^{-t}$$\dfrac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

We can also look at Poisson point processes (aka, Poisson random measures, but that Wikipedia page is very poor). This is just makes rigorous the idea of randomly choosing unordered sets of points from a sigma-finite measure space $(E,\mathcal{E},\mu)$. Technically, it can be defined as a set of nonnegative integer-valued random variables $\\{N(A)\colon A\in\mathcal{E}\\}$$\{N(A)\colon A\in\mathcal{E}\}$ counting the number of points chosen from each subset A, such that $N(A)$ has the Poisson distribution of rate $\mu(A)$ and $N(A_1),N(A_2),\ldots$ are independent for pairwise disjoint sets $A_1,A_2,\ldots$. By definition, this satisfies $$ \begin{array}{}\mathbb{P}(N(A)=n)=\frac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array} $$$$ \begin{array}{}\mathbb{P}(N(A)=n)=\dfrac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array} $$ The points $T_1,T_2,\ldots$ above defining the homogeneous Poisson process also define a Poisson random measure with respect to the Lebesgue measure $(\mathbb{R}\_+,{\cal B},\lambda)$. Once you forget about the order in which they were defined and just regard them as a random set that is, which I think is the source of the $n!$. If you think about the probability of $T_{n+1}$ being in a small interval $(t,t+\delta t)$ then this is just the same as having $N([0,t])=n$ and $N((t,t+\delta t))=1$, which has probability $\frac{t^n}{n!}e^{-t}\delta t$$\dfrac{t^n}{n!}e^{-t}\delta t$.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\frac{t^n}{n!}e^{-t}dt$$\dfrac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\frac{t^n}{n!}e^{-t}$$\dfrac{t^n}{n!}e^{-t}$ maximized at $t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\frac{t^n}{n!}e^{-t}\approx\frac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$$\dfrac{t^n}{n!}e^{-t}\approx\dfrac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

I'll start with a bit of theory before trying to describe intuitively why the probability density $\frac{t^n}{n!}e^{-t}$ pops out.

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\frac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

We can also look at Poisson point processes (aka, Poisson random measures, but that Wikipedia page is very poor). This is just makes rigorous the idea of randomly choosing unordered sets of points from a sigma-finite measure space $(E,\mathcal{E},\mu)$. Technically, it can be defined as a set of nonnegative integer-valued random variables $\\{N(A)\colon A\in\mathcal{E}\\}$ counting the number of points chosen from each subset A, such that $N(A)$ has the Poisson distribution of rate $\mu(A)$ and $N(A_1),N(A_2),\ldots$ are independent for pairwise disjoint sets $A_1,A_2,\ldots$. By definition, this satisfies $$ \begin{array}{}\mathbb{P}(N(A)=n)=\frac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array} $$ The points $T_1,T_2,\ldots$ above defining the homogeneous Poisson process also define a Poisson random measure with respect to the Lebesgue measure $(\mathbb{R}\_+,{\cal B},\lambda)$. Once you forget about the order in which they were defined and just regard them as a random set that is, which I think is the source of the $n!$. If you think about the probability of $T_{n+1}$ being in a small interval $(t,t+\delta t)$ then this is just the same as having $N([0,t])=n$ and $N((t,t+\delta t))=1$, which has probability $\frac{t^n}{n!}e^{-t}\delta t$.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\frac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\frac{t^n}{n!}e^{-t}$ maximized at $t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\frac{t^n}{n!}e^{-t}\approx\frac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

I'll start with a bit of theory before trying to describe intuitively why the probability density $\dfrac{t^n}{n!}e^{-t}$ pops out.

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\dfrac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

We can also look at Poisson point processes (aka, Poisson random measures, but that Wikipedia page is very poor). This is just makes rigorous the idea of randomly choosing unordered sets of points from a sigma-finite measure space $(E,\mathcal{E},\mu)$. Technically, it can be defined as a set of nonnegative integer-valued random variables $\{N(A)\colon A\in\mathcal{E}\}$ counting the number of points chosen from each subset A, such that $N(A)$ has the Poisson distribution of rate $\mu(A)$ and $N(A_1),N(A_2),\ldots$ are independent for pairwise disjoint sets $A_1,A_2,\ldots$. By definition, this satisfies $$ \begin{array}{}\mathbb{P}(N(A)=n)=\dfrac{\mu(A)^n}{n!}e^{-\mu(A)}.&&(1)\end{array} $$ The points $T_1,T_2,\ldots$ above defining the homogeneous Poisson process also define a Poisson random measure with respect to the Lebesgue measure $(\mathbb{R}\_+,{\cal B},\lambda)$. Once you forget about the order in which they were defined and just regard them as a random set that is, which I think is the source of the $n!$. If you think about the probability of $T_{n+1}$ being in a small interval $(t,t+\delta t)$ then this is just the same as having $N([0,t])=n$ and $N((t,t+\delta t))=1$, which has probability $\dfrac{t^n}{n!}e^{-t}\delta t$.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\dfrac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\dfrac{t^n}{n!}e^{-t}$ maximized at $t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\dfrac{t^n}{n!}e^{-t}\approx\dfrac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

Improved formatting. I don't believe that I missed anything.
Source Link
gebruiker
  • 6.3k
  • 5
  • 30
  • 79

I haven't quite got this straight yet, but I think one way to go is to think about choosing points at random from the positive reals. This answer is going to be rather longer than it really needs to be, because I'm thinking about this in a few (closely related) ways, which probably aren't all necessary, and you can decide to reject the uninteresting parts and keep anything of value. Very roughly, the idea is that if you "randomly" choose points from the positive reals and arrange them in increasing order, then the probability that the (n+1)th$(n+1)^\text{th}$ point is in a small interval (t,t+dt)$(t,t+dt)$ is a product of probabilities of independent events, n$n$ factors of t$t$ for choosing n$n$ points in the interval [0,t]$[0,t]$, one factor of $e^{-t}$ as all the other points are in $[t,\infty)$, one factor of dt$dt$ for choosing the point in $(t,t+dt)$, and a denominator of $n!$ coming from the reordering. At least, as an exercise in making a simple problem much harder, here it goes...

We can look at the homogeneous Poisson process (with rate parameter 1$1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter 1$1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\frac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as Itit would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval [0,t]$[0,t]$.

So, how can we choose points at random so that each small set $\delta A$ has probability $\mu(\delta A)$ of containing a point, and why does (1)$(1)$ pop out? I'm imagining a hopeless darts player randomly throwing darts about and, purely by luck, hitting the board with some of them. Consider throwing a very large number $N\gg1$ of darts, independently, so that each one only has probability $\mu(A)/N$ of hitting the set, and is distributed according to the probability distribution $\mu/\mu(A)$. This is consistent, at least, if you think about the probability of hitting a subset $B\subseteq A$. The probability of missing with all of them is $(1-\mu(A)/N)^N=e^{-\mu(A)}$. This is a multiplicative function due to independence of the number hitting disjoint sets. To get the probability of one dart hitting the set, multiply by $\mu(A)$ (one factor of $\mu(A)/N$ for each individual dart, multiplied by N$N$ because there's Nthere are $N$ of them). For n$n$ darts, we multiply by $\mu(A)$ n$n$ times, for picking n$n$ darts to hit, then divide by n!$n!$ because we have over-counted the subsets of size n$n$ by this factor (due to counting all n!$n!$ ways of ordering them). This gives (1)$(1)$. I think this argument can probably be cleaned up a bit.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\frac{t^n}{n!}e^{-t}dt$ of picking n$n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\frac{t^n}{n!}e^{-t}$ maximized at t=n$t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of n+1$n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around n$n$. The central limit theorem goes further, and gives $\frac{t^n}{n!}e^{-t}\approx\frac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by log$\log$, not exp$\exp$) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

I haven't quite got this straight yet, but I think one way to go is to think about choosing points at random from the positive reals. This answer is going to rather longer than it really needs to be, because I'm thinking about this in a few (closely related) ways, which probably aren't all necessary, and you can decide to reject the uninteresting parts and keep anything of value. Very roughly, the idea is that if you "randomly" choose points from the positive reals and arrange them in increasing order, then the probability that the (n+1)th point is in a small interval (t,t+dt) is a product of probabilities of independent events, n factors of t for choosing n points in the interval [0,t], one factor of $e^{-t}$ as all the other points are in $[t,\infty)$, one factor of dt for choosing the point in $(t,t+dt)$, and a denominator of $n!$ coming from the reordering. At least, as an exercise in making a simple problem much harder, here goes...

We can look at the homogeneous Poisson process (with rate parameter 1). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter 1, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\frac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as It would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval [0,t].

So, how can we choose points at random so that each small set $\delta A$ has probability $\mu(\delta A)$ of containing a point, and why does (1) pop out? I'm imagining a hopeless darts player randomly throwing darts about and, purely by luck, hitting the board with some of them. Consider throwing a very large number $N\gg1$ of darts, independently, so that each one only has probability $\mu(A)/N$ of hitting the set, and is distributed according to the probability distribution $\mu/\mu(A)$. This is consistent, at least, if you think about the probability of hitting a subset $B\subseteq A$. The probability of missing with all of them is $(1-\mu(A)/N)^N=e^{-\mu(A)}$. This is a multiplicative function due to independence of the number hitting disjoint sets. To get the probability of one dart hitting the set, multiply by $\mu(A)$ (one factor of $\mu(A)/N$ for each individual dart, multiplied by N because there's N of them). For n darts, we multiply by $\mu(A)$ n times, for picking n darts to hit, then divide by n! because we have over-counted the subsets of size n by this factor (due to counting all n! ways of ordering them). This gives (1). I think this argument can probably be cleaned up a bit.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\frac{t^n}{n!}e^{-t}dt$ of picking n in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\frac{t^n}{n!}e^{-t}$ maximized at t=n? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of n+1 IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around n. The central limit theorem goes further, and gives $\frac{t^n}{n!}e^{-t}\approx\frac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by log, not exp) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

I haven't quite got this straight yet, but I think one way to go is to think about choosing points at random from the positive reals. This answer is going to be rather longer than it really needs to be, because I'm thinking about this in a few (closely related) ways, which probably aren't all necessary, and you can decide to reject the uninteresting parts and keep anything of value. Very roughly, the idea is that if you "randomly" choose points from the positive reals and arrange them in increasing order, then the probability that the $(n+1)^\text{th}$ point is in a small interval $(t,t+dt)$ is a product of probabilities of independent events, $n$ factors of $t$ for choosing $n$ points in the interval $[0,t]$, one factor of $e^{-t}$ as all the other points are in $[t,\infty)$, one factor of $dt$ for choosing the point in $(t,t+dt)$, and a denominator of $n!$ coming from the reordering. At least, as an exercise in making a simple problem much harder, here it goes...

We can look at the homogeneous Poisson process (with rate parameter $1$). One way to think of this is to take a sequence on independent exponentially distributed random variables with rate parameter $1$, $S_1,S_2,\ldots$, and set $T_n=S_1+\cdots+S_n$. As has been commented on already, $T_{n+1}$ has the probability density function $\frac{t^n}{n!}e^{-t}$. I'm going to avoid proving this immediately though, as it would just reduce to manipulating some integrals. Then, the Poisson process $X(t)$ counts the number of times $T_i$ lying in the interval $[0,t]$.

So, how can we choose points at random so that each small set $\delta A$ has probability $\mu(\delta A)$ of containing a point, and why does $(1)$ pop out? I'm imagining a hopeless darts player randomly throwing darts about and, purely by luck, hitting the board with some of them. Consider throwing a very large number $N\gg1$ of darts, independently, so that each one only has probability $\mu(A)/N$ of hitting the set, and is distributed according to the probability distribution $\mu/\mu(A)$. This is consistent, at least, if you think about the probability of hitting a subset $B\subseteq A$. The probability of missing with all of them is $(1-\mu(A)/N)^N=e^{-\mu(A)}$. This is a multiplicative function due to independence of the number hitting disjoint sets. To get the probability of one dart hitting the set, multiply by $\mu(A)$ (one factor of $\mu(A)/N$ for each individual dart, multiplied by $N$ because there are $N$ of them). For $n$ darts, we multiply by $\mu(A)$ $n$ times, for picking $n$ darts to hit, then divide by $n!$ because we have over-counted the subsets of size $n$ by this factor (due to counting all $n!$ ways of ordering them). This gives $(1)$. I think this argument can probably be cleaned up a bit.

Getting back to choosing points randomly on the positive reals, this gives a probability of $\frac{t^n}{n!}e^{-t}dt$ of picking $n$ in the interval $[0,t]$ and one in $(t,t+dt)$. If we sort them in order as $T_1\lt T_2\lt\cdots$ then $\mathbb{P}(T_1\gt t)=e^{-t}$, so it is exponentially distributed. Conditional on this, $T_2,T_3,\ldots$ are chosen randomly from $[T_1,\infty)$, so we see that the differences $T_{i+1}-T_{i}$ are independent and identically distributed.

Why is $\frac{t^n}{n!}e^{-t}$ maximized at $t=n$? I'm not sure why the mode should be a simple property of a distribution. It doesn't even exist except for unimodal distributions. As $T_{n+1}$ is the sum of $n+1$ IID random variables of mean one, the law of large numbers suggests that it should be peaked approximately around $n$. The central limit theorem goes further, and gives $\frac{t^n}{n!}e^{-t}\approx\frac{1}{\sqrt{2\pi n}}e^{-(t-n)^2/{2n}}$. Stirling's formula is just this evaluated at $t=n$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by $\log$, not $\exp$) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

added 113 characters in body
Source Link
George Lowther
  • 34.5k
  • 3
  • 79
  • 113

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by log, not exp) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

What's this to do with Tate's thesis? I don't know, and I haven't read it (but intend to), but have a vague idea of what it's about. If there is anything to do with it, maybe it is something to do with the fact that we are relating the sums of independent random variables $S_1+\cdots+S_n$ distributed with respect to the Haar measure on the multiplicative group $\mathbb{R}_+$ (edit: oops, that's not true, the multiplicative Haar measure has cumulative distribution given by log, not exp) with randomly chosen sets according to the Haar measure on the additive group $\mathbb{R}$.

added 2 characters in body
Source Link
George Lowther
  • 34.5k
  • 3
  • 79
  • 113
Loading
added 47 characters in body
Source Link
George Lowther
  • 34.5k
  • 3
  • 79
  • 113
Loading
Source Link
George Lowther
  • 34.5k
  • 3
  • 79
  • 113
Loading