0
$\begingroup$

I have an unbalanced dataset of 247 individuals, with 58 events. I am trying to select features via lasso regression. There is controversy about my sample size, because seems that lasso accounts for n<<p scenaries, however I was advised previously about the limitations of my sample size. Research shows an extensive amount of studies with similar sample sizes using elastic net for selecting variables

In this regard, my doubt lies in, if it is there the necessity to account for the different proportion between the events and controls?

On the other hand, does it make any sense to make iterations to calculate lambda? Or is it better just find alpha and let the function calculate lambda. Because as I saw in previous threads it is recommended to calculate alpha and then lambda, but not simultaneously. However, to get alpha you automatically obtain lambda value. Despite this fact, we calculate best lambda value

 alpha_grid <- seq(0, 1, by = 0.1) cv_errors <- numeric(length(alpha_grid)) cv_models <- list() # Getting the best alpha value (internally obtaining lambda) for (i in seq_along(alpha_grid)) { print(alpha_grid[i]) a <- alpha_grid[i] cv_fit <- cv.glmnet(x, y, family = "cox", alpha = a, nfolds = 5) cv_errors[i] <- min(cv_fit$cvm) cv_models[[i]] <- cv_fit } best_index_p6 <- which.min(cv_errors) best_alpha_p6 <- alpha_grid[best_index_p6] cat("Mejor alpha (p6):", best_alpha_p6, "\n") # Get the better lambda iterating through the seed (although we got a lambda value getting lambda) n <- 100 lambdas_p6 <- NULL for (i in 1:n) { set.seed(i) fit <- cv.glmnet(x, y, family = "cox", alpha = best_alpha_p6) errors <- data.frame(lambda = fit$lambda, cvm = fit$cvm) lambdas_p6 <- rbind(lambdas_p6, errors) } lambda_summary_p6 <- aggregate(cvm ~ lambda, data = lambdas_p6, mean) bestindex_p6 <- which.min(lambda_summary_p6$cvm) bestlambda_p6 <- lambda_summary_p6$lambda[bestindex_p6] cat("Mejor lambda promedio (P6):", bestlambda_p6, "\n") final_model_p6 <- glmnet(x, y, family = "cox", alpha = best_alpha_p6, lambda = bestlambda_p6) selected_vars_p6 <- coef(final_model_p6) selected_vars_p6 <- selected_vars_p6[selected_vars_p6[, 1] != 0, ] print("Variables seleccionadas (p6):") print(selected_vars_p5) 
$\endgroup$

1 Answer 1

2
$\begingroup$

The fact that many previous small studies using lasso or elastic net have been published does not justify this. These procedures require very large sample sizes and have low chances of selecting the right variables.

Don’t use the word “imbalance” which leads to nothing but bad analytical detours. Just think about the effective sample size which as described here is $3np(1-p)$ where $p=58/247$. As a rough starting point you need 96 + 15 times the number of candidate features as the sample size to get a reliable analysis. It takes 96 observations just to estimate the intercept in a logistic model. Also as detailed in that chapter you’d be better off with unsupervised learning (data reduction) instead of feature selection.

A weakness in elastic net is that most people to not independently vary the two penalty parameters. It’s good to see you worrying about this. But the sample size is too small for the data to be able to tell you which penalties to use, so this is all a catch-22.

For many situations a Bayesian model with horseshoe priors works better than lasso or elastic net. This has the added benefit of giving you standard uncertainty intervals when finished.

$\endgroup$
2
  • $\begingroup$ Thank you for your answer. n<<p is false advertising. In future I will get more sample, this is why cases and controls proportion disturbs me, although I don't believe I will achieve this amount of sample size (3np(1−p)). Is it good approach then to sequentially optimize both parameters? In the formula, is p == number of variables? or just 58/247? $\endgroup$ Commented May 23 at 11:56
  • 1
    $\begingroup$ $p$ there is the proportion of events. Ideally optimize 2 penalty parameters over a regular grid but you’d need to bootstrap that whole process to show stability of found penalty values. Think about data reduction instead, e.g., stop separating hard-to-separate predictors. $\endgroup$ Commented May 23 at 14:07

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.