Maybe some detailed explanation about the excellent answer of @Frank Harrell
The test statistic of a Likelihood-ratio test (LRT) is defined as (Wikipedia)
$$ \lambda_{\text{LR}} = -2(\ell_0 - \ell_A) $$
where $\ell_i$ is the log likelihood of model $i$. Under $H_0$
$$ \lambda_{\text{LR}} \overset{a}{\sim} \chi^2_q. $$
The AIC is defined as (Wikipedia)
$$ \text{AIC} = 2k - 2\ell $$
where $k$ is the number of estimated parameters and $\ell$ is the log likelihood.
The difference in AIC between the two models (let's say Model $0$ and Model $A$ where the difference in the number of free parameters is $q$) is given by
\begin{align*} \Delta\text{AIC} &= 2k_0 - 2\ell_0 - (2k_A - 2\ell_A) \\ &= -2q - 2(\ell_0 - \ell_A). \end{align*}
Therefore,
$$ \Delta\text{AIC} + 2q = \underbrace{-2(\ell_0 - \ell_A)}_{\lambda_{\text{LR}}}. $$
This shows a direct association between AIC and LRT.
- The AIC of both models will be equal if $\lambda_{\text{LR}} = 2q$
- The AIC of the null model will be smaller if $\lambda_{\text{LR}} < 2q$
- The AIC of the alternative model will be smaller if $\lambda_{\text{LR}} > 2q$
If we select models by AIC we implicitly apply a LRT and check if the $\lambda_{\text{LR}}$ is larger or smaller then $2q$. The $2q$ threshold corresponds to a specific p-value of the LRT which can be calculated in R using pchisq(2q,df=q,lower.tail=FALSE). In the following you find a table with lists some p-values for different values of $q$.
$$ \begin{array}{rrr} \hline q & \lambda_{\text{LR}} & p\text{-value} \\ \hline 1 & 2 & 0.157 \\ 2 & 4 & 0.135 \\ 3 & 6 & 0.112 \\ 5 & 10 & 0.075 \\ 10 & 20 & 0.029 \\ 20 & 40 & 0.005 \\ \end{array} $$
For example, selecting between two models based on AIC where the nested model has 3 parameters constrained compared to the alternative one is equivalent to making a LRT and rejecting the null model at a significance level of $0.112$.
A similar association can be made between LRT and BIC which is defined as (Wikipedia)
$$ \text{BIC} = \log(n)k - 2\ell $$
where $n$ is the number of observations, $k$ is the number of estimated parameters, and $\ell$ is the log likelihood. Using the same approach as above we see that
$$ \Delta\text{BIC} + \log(n)q = \underbrace{-2(\ell_0 - \ell_A)}_{\lambda_{\text{LR}}}. $$
In the following you find a table which lists corresponding p-values of the LRT for different values of $n$ and $q$
$$ \begin{array}{rrrr} \hline n & q & \lambda_{\text{LR}} & p\text{-value} \\ \hline 10 & 1 & 2.30 & 0.1292 \\ 10 & 2 & 4.61 & 0.1000 \\ 10 & 3 & 6.91 & 0.0749 \\ 100 & 1 & 4.61 & 0.0319 \\ 100 & 2 & 9.21 & 0.0100 \\ 100 & 3 & 13.82 & 0.0032 \\ \end{array} $$