0

How do I know the distribution and link function used in model development that the train function in caret uses? I have a dataset (My_Data) of values of houses (Value_House) and characteristics of the house (the covariates). The idea is to predict value of the house based on its covariates. I want to use R to select features using the AIC-minimizing criterion, so I use the caret package with the instruction:

My_model <- train(Form=Value_House~.,data=My_Data,na.action=na.pass,method="glmStepAIC") 

I get a model, and everything seems reasonable. I have been trying, however, to find the information on the distribution used in the fitting process, and the respective link function, but I do not see it anywhere on the return values or the help. Could someone tell me how I can see that information? Is it possible to change the default values?

1
  • 2
    I would assume it uses the normal distribution and the identity link since that is the default of glm. I believe train should pass any family argument to glm. Commented Jan 15 at 10:12

1 Answer 1

0

I agree that this could be better documented. When used for regression, train() uses gaussian() as the family (i.e., a linear model), and when used for binary classification, it uses binomial(), which defaults to logistic regression. The family argument can be passed directly to train() as it would be to glm() to specify exactly which family you want to use. For example, you could supply family = poisson() to request a Poisson regression model with a log link, or family = binomial("probit") to requests a probit regression model.

I learned this by using caret::getModelInfo("glmStepAIC"), which produces an object with a fit component containing the function use to fit the model. Reading the code, I can see that if family is not supplied, it defaults to the values I mentioned above. In particular, this code indicates that:

if (!any(names(theDots) == "family")) { glmArgs$family <- if (is.factor(y)) binomial() else gaussian() } else glmArgs$family <- theDots$family 

These arguments are eventually passed to glm(), as @Roland suggested in the comment to the OP.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much for this information. Very helpful! :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.