Connections between subset selection success and logistic distributions

Question

I'm a novice with stats, so if I'm missing something obvious, don't sue me. I was recently working on an assignment where I was tasked with analyzing the following model with subset selection: $$ y_i = \beta_1 x_{1, i} + \beta_2 x_{2,i} + \beta_3 x_{3,i} + \beta_4 x_{4,i} + u_i $$ Where $$(x_{k,i})_{k=1}^4 \sim \mathcal{N}(0, \Sigma)$$ $$ \Sigma = \begin{pmatrix} 1 & 0 & 0 & 0\\ 0 & 1 & \gamma & \gamma\\ 0 & \gamma & 1 & \gamma\\ 0 & \gamma & \gamma & 1 \end{pmatrix} $$ Where $\gamma \in [-1, 1]$. Additionally, we have that $\beta_2 = \beta_3 = \beta_4 = 2$, and $\beta_1 > 2$. Finally, $u_i$ are i.i.d. normally distributed with mean zero and variance 2, and $i = 1, \ldots, N$.

We were tasked with identifying the thresholds of $\gamma$ and $\beta_1$, for which, subset selection, with the size of the subset equal to $1$, correctly identifies the most important covariate, namely, $\beta_1$.

In this process, I managed to derive the following curves, for $N = 500$, $\beta = 4$ for the first one and $\gamma = 0$ for the second.

Proportion Correctly Identified Varying With <span class= $\gamma$" />

Proportion correctly identified varying with <span class= $\beta$" />

To my eye, these look strikingly logistic! So, I ran a simple logistic curve fit, with the following model: $$ f(x) = \frac{A}{1 + \exp(-k(x - x_0))} $$

I managed the following parameters

The best fit logistic function for gamma is: A = 1.0011916177515001, x0 = 0.46613345213429164, k = -25.287773424878644 99% confidence intervals: [[ 0.99834479 1.00403845] [ 0.46495465 0.46731225] [-25.94535303 -24.63019382]] The best fit logistic function for beta is: A = 1.0008861957715565, x0 = 4.131595820642967, k = 6.469500267867452 99% confidence intervals: [[0.99799143 1.00378096] [4.12631938 4.13687226] [6.27646573 6.66253481]]

With the following graphs (not informative, just pretty):

I don't fully understand how to measure goodness of fit for non-linear least squares, but these are very tight confidence intervals, so this finally leads me to my question.

Is there a good theoretical explanation for this behavior? It appears that the probability of correctly identifying a covariate with subset selection is logistic distribution with these magic parameters. Is this actually the case? Is it asymptotically logistic? Is this actually just a normal distribution (something I'm realizing as I write this)?

The probit function related to a normal distribution is very close to the logit. Look at the plot in this answer in the context of binary regression, for example, or the plot in this Wikipedia section. Try fitting a probit, also. — EdM
– EdM, Commented Jan 25 at 17:04

Stack Exchange Network

Connections between subset selection success and logistic distributions

0

Linked

Hot Network Questions

Connections between subset selection success and logistic distributions

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Linked

Related

Hot Network Questions