I am looking at using the lasso as a method for selecting features and fitting a predictive model with a binary target. Below is some code I was playing with to try out the method with regularized logistic regression.
My question is I get a group of "significant" variables but am I able to rank order these to estimate relative importance of each? Can the coefficients be standardized for this purpose of rank by absolute value (I understand that they are shown on the original variable scale through the coef function)? If so, how to do so (using the standard deviation of x and y) Standardize Regression Coefficients.
SAMPLE CODE:
library(glmnet) #data comes from #http://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic) datasetTest <- read.csv('C:/Documents and Settings/E997608/Desktop/wdbc.data.txt',head=FALSE) #appears to use the first level as the target success datasetTest$V2<-as.factor(ifelse(as.character(datasetTest$V2)=="M","0","1")) #cross validation to find optimal lambda #using the lasso because alpha=1 cv.result<-cv.glmnet( x=as.matrix(dataset[,3:ncol(datasetTest)]), y=datasetTest[,2], family="binomial", nfolds=10, type.measure="deviance", alpha=1 ) #values of lambda used histogram(cv.result$lambda) #plot of the error measure (here was deviance) #as a CI from each of the 10 folds #for each value of lambda (log actually) plot(cv.result) #the mean cross validation error (one for each of the #100 values of lambda cv.result$cvm #the value of lambda that minimzes the error measure #result: 0.001909601 cv.result$lambda.min log(cv.result$lambda.min) #the value of lambda that minimzes the error measure #within 1 SE of the minimum #result: 0.007024236 cv.result$lambda.1se #the full sequence was fit in the object called cv.result$glmnet.fit #this is same as a call to it directly. #here are the coefficients from the min lambda coef(cv.result$glmnet.fit,s=cv.result$lambda.1se)