loop over variable names

Question

I am trying to build various regression models with different columns (independent variables in my dataset).

set.seed(0) True = rnorm(20, 100, 10) v = matrix(rnorm(120, 10, 3), nrow = 20) dt = data.frame(cbind(True, v)) colnames(dt) = c('True', paste0('ABC', 1:6))

So the independent variables I want to throw in the data is "ABCi", aka when i=1, use ABC1, etc. Each model uses the first 80% of the observations to build, then I make a prediction on the rest 20%.

I tried this:

reg.pred = rep(0, ncol(dt)) for (i in 1:nrow(dt)){ reg = lm(True~paste0('ABC', i), data = dt[(1:(0.8*nrow(dt))),]) reg.pred[i] = predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),]) }

Not working... giving errors like:

Error in model.frame.default(formula = True ~ paste0("ABC", i), data = dt[(1:(0.8 * : variable lengths differ (found for 'paste0("ABC", i)')

Not sure how can I retrieve the variable name in a loop... Any suggestion is appreciated!

It looks like you're looping through rows with for (i in 1:nrow(dt)){ instead of columns — divibisan
– divibisan, Commented Mar 7, 2019 at 17:12

duckmayr · Accepted Answer · 2019-03-07 17:09:37Z

You do not technically need to use as.formula() as @Sonny suggests, but you cannot mix a character representation of the formula and formula notation. So, you need to fix that. However, once you do, you'll notice that there are other issues with your code that @Sonny either did not notice or opted not to address.

Most notably, the line

reg.pred = rep(0, ncol(dt))

implies you want a single prediction from each model, but

predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),])

implies you want a prediction for each of the observations not in the training set (you'll need a +1 after 0.8*nrow(dt) for that by the way).

I think the following should fix all your issues:

set.seed(0) True = rnorm(20, 100, 10) v = matrix(rnorm(120, 10, 3), nrow = 20) dt = data.frame(cbind(True, v)) colnames(dt) = c('True', paste0('ABC', 1:6)) # Make a matrix for the predicted values; each column is for a model reg.pred = matrix(0, nrow = 0.2*nrow(dt), ncol = ncol(dt)-1) for (i in 1:(ncol(dt)-1)){ # Get the name of the predictor we want here this_predictor <- paste0("ABC", i) # Make a character representation of the lm formula lm_formula <- paste("True", this_predictor, sep = "~") # Run the model reg = lm(lm_formula, data = dt[(1:(0.8*nrow(dt))),]) # Get the appropriate test data newdata <- data.frame(dt[(0.8*nrow(dt)+1):nrow(dt), this_predictor]) names(newdata) <- this_predictor # Store predictions reg.pred[ , i] = predict(reg, newdata = newdata) } reg.pred # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 100.2150 100.8394 100.7915 99.88836 97.89952 105.7201 # [2,] 101.2107 100.8937 100.9110 103.52487 102.13965 104.6283 # [3,] 100.0426 101.0345 101.2740 100.95785 102.60346 104.2823 # [4,] 101.1055 100.9686 101.5142 102.56364 101.56400 104.4447

In this matrix of predictions, each column is from a different model, and the rows correspond to the last four rows of your data (the rows not in your training set).

Thank you so much! Yes, I didn't realize the errors in my original code, and now it works perfectly. Really appreciated!

Sonny · Accepted Answer · 2019-03-07 16:58:11Z

1

You can use as.formula

 f <- as.formula( paste("True", paste0('ABC', i), sep = " ~ ")) reg = lm(f, data = dt[(1:(0.8*nrow(dt))),])

answered Mar 7, 2019 at 16:58

Sonny

3,1651 gold badge13 silver badges19 bronze badges

1 Comment

duckmayr Over a year ago

Did you try running her code with that modification?

Collectives™ on Stack Overflow

loop over variable names

2 Answers 2

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Related