0
$\begingroup$

I have the log-likelihood function: $$l(\overrightarrow\beta)=\sum_{i=1}^n [y_i log(p(\overrightarrow x_i;\overrightarrow\beta))+(1-y_i)log(1-p(\overrightarrow x_i;\overrightarrow\beta)] $$

where $p(\overrightarrow x_i;\overrightarrow\beta)=\frac{e^{{\overrightarrow\beta^T}\overrightarrow x_i}}{{1+\overrightarrow\beta^T}\overrightarrow x_i} $ where $\overrightarrow\beta=(0,\beta_1)^T$ is the parameter vector and $\overrightarrow x$ is the matrix of inputs, whose first column is all 1's.

The two classes are $y_i=0$ or $1$, and since there is a single binary regressor, $\overrightarrow x_i$ will be an $n\times2$ matrix where $x=0$ or $1$.

Additionally, $n_{1,0}$ denotes the number of observations with $x_i=1$ and $y_i=0$, and $n_{1,1}$ denotes the number of observations with $x_i=1$ and $y_i=1$.

The max. likelihood estimator of $\beta_1$ is claimed to be $log\frac{n_{1,1}}{n_{1,0}}$, but I can't see why that's the case. I know how to find the first derivative of the log-likelihood function: $\frac{\partial l(\overrightarrow \beta)}{\partial\overrightarrow \beta}=\sum_{i=1}^n [\overrightarrow x_i (y_i-p(\overrightarrow x_i;\overrightarrow\beta))]$ *

I know for maximization we woud set this equal to zero, and can see that * breaks into two equations since $\overrightarrow x_i= (1,1)$ or $(1,0)$. For the first case we would arrive at $\sum_{i=1}^n y_i = \sum_{i=1}^n p(\overrightarrow x_i;\overrightarrow\beta)$, but I'm not sure what the next step might be to arrive at the given result.

$\endgroup$

1 Answer 1

0
$\begingroup$

Since there's only one regressor $x$ and there's no intercept in the model, you can treat each $x_i$ as a scalar instead of a vector, and regard $\beta=\beta_1$ as a scalar instead of a vector. So I'll drop the vector notation from now on. Set the expression $ \sum_i[x_i(y_i-p(x_i;\beta))] $ to zero. This yields $$ \sum x_iy_i = \sum x_i p(x_i;\beta)\tag1 $$ The LHS of (1) simplifies to $n_{1,1}$ since the terms where $x_i=0$ or $y_i=0$ don't contribute.

Similarly the RHS of (1) simplifies to $$\sum_{x_i=1} p(x_i;\beta)= \#\{x_i=1\}\cdot p(1;\beta) = (n_{1,0} + n_{1,1}) e^{\beta_1}/(1+e^{\beta_1}) .$$

With these simplifications you can solve (1) for $e^{\beta_1}=n_{1,1}/n_{1,0}$.

Added: If your model had two parameters, say $\vec\beta=(\beta_1,\beta_2)$, for the intercept and the binary regressor $x_i$ (sorry, the meaning of $\beta_1$ has changed), then your equation ($*$) would split into two equations for the two unknowns $\beta_1$ and $\beta_2$. The $k$th equation would involve the column $k$ of the $x$ matrix: $$\sum x_{i,k}y_i=\sum x_{i,k} p(\vec x_i, \vec\beta).\tag2$$

The equation for column $2$ would be simplified as before using $n_{1,1}$ and $n_{1,0}$: $$ n_{1,1} = (n_{1,0}+n_{1,1})p((1,1),\vec\beta)=(n_{1,0}+n_{1,1})\frac{e^{\beta_1+\beta_2}}{1+e^{\beta_1+\beta_2}}\tag3 $$ As for the intercept column, substitute $x_{i,1}=1$ for all $i$ to get: $$ \sum y_i =\sum p(\vec x_i, \vec\beta).\tag4 $$ The LHS would involve only cases where $y_i=1$, and the RHS would break into one sum where $x_i=0$ and one sum where $x_i=1$: $$ n_{0,1}+n_{1,1}=(n_{0,0}+n_{0,1})\frac{e^{\beta_1}}{1+e^{\beta_1}} +(n_{1,0}+n_{1,1})\frac{e^{\beta_1+\beta_2}}{1+e^{\beta_1+\beta_2}}\tag5 $$

$\endgroup$
2
  • $\begingroup$ Thank you very much, that makes things clearer. If I were looking at a similar problem where the model had an intercept, what could I do instead of treating β and x as scalars at (1)? I'm assuming I'd have four cases to consider: n_(1,1), n_(1,0), n_(0,0), and n_(0,1). $\endgroup$ Commented Oct 15, 2018 at 22:08
  • $\begingroup$ @sk13 The MLE for the vector $\beta$ would involve the four possible $n$ values. See my edit. $\endgroup$ Commented Oct 15, 2018 at 23:26

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.