3
$\begingroup$

I'm a novice with stats, so if I'm missing something obvious, don't sue me. I was recently working on an assignment where I was tasked with analyzing the following model with subset selection: $$ y_i = \beta_1 x_{1, i} + \beta_2 x_{2,i} + \beta_3 x_{3,i} + \beta_4 x_{4,i} + u_i $$ Where $$(x_{k,i})_{k=1}^4 \sim \mathcal{N}(0, \Sigma)$$ $$ \Sigma = \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & \gamma & \gamma\\ 0 & \gamma & 1 & \gamma\\ 0 & \gamma & \gamma & 1 \end{pmatrix} $$ Where $\gamma \in [-1, 1]$. Additionally, we have that $\beta_2 = \beta_3 = \beta_4 = 2$, and $\beta_1 > 2$. Finally, $u_i$ are i.i.d. normally distributed with mean zero and variance 2, and $i = 1, \ldots, N$.

We were tasked with identifying the thresholds of $\gamma$ and $\beta_1$, for which, subset selection, with the size of the subset equal to $1$, correctly identifies the most important covariate, namely, $\beta_1$.

In this process, I managed to derive the following curves, for $N = 500$, $\beta = 4$ for the first one and $\gamma = 0$ for the second.

Proportion Correctly Identified Varying With <span class=$\gamma$" />

Proportion correctly identified varying with <span class=$\beta$" />

To my eye, these look strikingly logistic! So, I ran a simple logistic curve fit, with the following model: $$ f(x) = \frac{A}{1 + \exp(-k(x - x_0))} $$

I managed the following parameters

The best fit logistic function for gamma is: A = 1.0011916177515001, x0 = 0.46613345213429164, k = -25.287773424878644 99% confidence intervals: [[ 0.99834479 1.00403845] [ 0.46495465 0.46731225] [-25.94535303 -24.63019382]] The best fit logistic function for beta is: A = 1.0008861957715565, x0 = 4.131595820642967, k = 6.469500267867452 99% confidence intervals: [[0.99799143 1.00378096] [4.12631938 4.13687226] [6.27646573 6.66253481]] 

With the following graphs (not informative, just pretty): True gamma vs logistic prediction True beta vs logistic prediction

I don't fully understand how to measure goodness of fit for non-linear least squares, but these are very tight confidence intervals, so this finally leads me to my question.

Is there a good theoretical explanation for this behavior? It appears that the probability of correctly identifying a covariate with subset selection is logistic distribution with these magic parameters. Is this actually the case? Is it asymptotically logistic? Is this actually just a normal distribution (something I'm realizing as I write this)?

$\endgroup$
1
  • $\begingroup$ The probit function related to a normal distribution is very close to the logit. Look at the plot in this answer in the context of binary regression, for example, or the plot in this Wikipedia section. Try fitting a probit, also. $\endgroup$ Commented Jan 25 at 17:04

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.