likelihood as random function wrt data

Question

Suppose we have some dataset $x= \{ x_1, \dots, x_n\}$ where every datapoint is i.i.d., $x_i \sim P(\cdot|\theta^*)$ for some known distribution $P$ and true parameter $\theta^*$.

Then, for this dataset $x$, the likelihood is $\mathcal{L}(\theta)= P(x|\theta) = \prod_i P(x_i|\theta)$ is a function of parameters $\theta$. We can call this $\mathcal{L}_x(\theta)$ for that dataset $x$.

This likelihood function takes different forms depending on the data $x$. How do you characterize the distribution of $\mathcal{L}_x(\theta)$ in function space?

I know the mean of $\mathcal{L}_x(\theta)$ is approximately equal to $\mathcal{L}_{\mu_x}(\theta)$ via a first order Taylor expansion. What about the variance, and other properties of the distribution of the likelihood?

I'm not very deep in statistics, and don't really understand the role of moments and MGFs. How do these play a role (is it at all analogous to higher order terms in the Taylor expansion of a distribution?)

TLDR: What's the connection between the distribution of data, $P(\cdot | \theta^*)$, and the distribution of the likelihood function, $P(x|\theta)$ for some data $x$ coming from $P(\cdot | \theta^*)$?

Xi'an · Accepted Answer · 2021-02-09 21:00:31Z

The likelihood function $L(\theta|X)$ is indeed a random function (since $X$ is random) in the set of likelihood functions $$\{L(\cdot|x);\ x\in\mathsf X\} $$ Its distribution depends (obviously) on the statistical model. For instance

if $L(\theta|X)$ is the likelihood function attached to a Normal $\mathcal N(\theta,1)$ $n$-sample, and if $\theta^*$ is the true value of $\theta$, then $$-2\log(L(\theta^*|X_{1:n})) - \log(2\pi) = \sum_{i=1}^n (X_i-\theta^*)^2 \sim \chi^2_n$$ and $$-2\log[L(\theta|X_{1:n})] - \log(2\pi) = \sum_{i=1}^n (X_i-\theta)^2 \sim \chi^2_n(n(\theta-\theta^*)^2)$$
if $L(\theta|X)$ is the likelihood function attached to a Poisson $\mathcal P(\theta)$ distribution, and if $\theta^*$ is the true value of $\theta$, $$\log[L(\theta|X_{1:n})] = \log(\theta) \sum_{i=1}^n X_i -\theta -\sum_{i=1}^n \log (X_i!)$$ meaning that $$\left\{\log[L(\theta|X_{1:n})]+\sum_{i=1}^n \log (X_i!) +\theta\right\}\big/\log(\theta) \sim \mathcal P(n\theta^*)$$

Thanks for your response! So from what I can tell, you wrote out the log likelihood function and rearranged terms until some known distribution was on the RHS? But isn't that kind of just inverting the function so we know the distribution of [a function of the log likelihood]? — blue_egg
– blue_egg, Commented Feb 10, 2021 at 1:51
The examples were dropped to show that the distribution of the likelihood function does depend on the model. I rearranged the terms to end up with a standard distribution, rather than a non standard one. Asymptotically, the distribution of the (iid sample) log-likelihood is Normal. — Xi'an
– Xi'an, Commented Feb 10, 2021 at 5:29
I see, so the takeaway is that the distribution of the log likelihood 1) depends on the data distribution and 2) isn't generally a standard distribution. Your second sentence interests me -- can you explain more what you mean by "the distribution of the (iid sample) log likelihood"? Is this like CLT? — blue_egg
– blue_egg, Commented Feb 10, 2021 at 15:26
For a given value of $\theta$, $\log L(\theta|X_{1:n})$ is a sum of iid rvs, hence the CLT should apply under some appropriate conditions. — Xi'an
– Xi'an, Commented Feb 10, 2021 at 16:37
Ah thanks! So does that mean we can view the log likelihood as a gaussian process? (If the distribution of log $L_x(\theta)$ for a fixed $\theta$ is Gaussian, that would mean every joint and marginal of log $L_x(\theta)$ for different $\theta$ would be Gaussian?) — blue_egg
– blue_egg, Commented Feb 10, 2021 at 20:53

Stack Exchange Network

likelihood as random function wrt data

1 Answer 1

Linked

Hot Network Questions

likelihood as random function wrt data

1 Answer 1

Linked

Related

Hot Network Questions