0

I am trying to build various regression models with different columns (independent variables in my dataset).

set.seed(0) True = rnorm(20, 100, 10) v = matrix(rnorm(120, 10, 3), nrow = 20) dt = data.frame(cbind(True, v)) colnames(dt) = c('True', paste0('ABC', 1:6)) 

So the independent variables I want to throw in the data is "ABCi", aka when i=1, use ABC1, etc. Each model uses the first 80% of the observations to build, then I make a prediction on the rest 20%.

I tried this:

reg.pred = rep(0, ncol(dt)) for (i in 1:nrow(dt)){ reg = lm(True~paste0('ABC', i), data = dt[(1:(0.8*nrow(dt))),]) reg.pred[i] = predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),]) } 

Not working... giving errors like:

Error in model.frame.default(formula = True ~ paste0("ABC", i), data = dt[(1:(0.8 * : variable lengths differ (found for 'paste0("ABC", i)') 

Not sure how can I retrieve the variable name in a loop... Any suggestion is appreciated!

1
  • It looks like you're looping through rows with for (i in 1:nrow(dt)){ instead of columns Commented Mar 7, 2019 at 17:12

2 Answers 2

1

You do not technically need to use as.formula() as @Sonny suggests, but you cannot mix a character representation of the formula and formula notation. So, you need to fix that. However, once you do, you'll notice that there are other issues with your code that @Sonny either did not notice or opted not to address.

Most notably, the line

reg.pred = rep(0, ncol(dt)) 

implies you want a single prediction from each model, but

predict(reg, data = dt[(0.8*nrow(dt)):nrow(dt),]) 

implies you want a prediction for each of the observations not in the training set (you'll need a +1 after 0.8*nrow(dt) for that by the way).

I think the following should fix all your issues:

set.seed(0) True = rnorm(20, 100, 10) v = matrix(rnorm(120, 10, 3), nrow = 20) dt = data.frame(cbind(True, v)) colnames(dt) = c('True', paste0('ABC', 1:6)) # Make a matrix for the predicted values; each column is for a model reg.pred = matrix(0, nrow = 0.2*nrow(dt), ncol = ncol(dt)-1) for (i in 1:(ncol(dt)-1)){ # Get the name of the predictor we want here this_predictor <- paste0("ABC", i) # Make a character representation of the lm formula lm_formula <- paste("True", this_predictor, sep = "~") # Run the model reg = lm(lm_formula, data = dt[(1:(0.8*nrow(dt))),]) # Get the appropriate test data newdata <- data.frame(dt[(0.8*nrow(dt)+1):nrow(dt), this_predictor]) names(newdata) <- this_predictor # Store predictions reg.pred[ , i] = predict(reg, newdata = newdata) } reg.pred # [,1] [,2] [,3] [,4] [,5] [,6] # [1,] 100.2150 100.8394 100.7915 99.88836 97.89952 105.7201 # [2,] 101.2107 100.8937 100.9110 103.52487 102.13965 104.6283 # [3,] 100.0426 101.0345 101.2740 100.95785 102.60346 104.2823 # [4,] 101.1055 100.9686 101.5142 102.56364 101.56400 104.4447 

In this matrix of predictions, each column is from a different model, and the rows correspond to the last four rows of your data (the rows not in your training set).

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you so much! Yes, I didn't realize the errors in my original code, and now it works perfectly. Really appreciated!
@RachelZhang Great, glad it helped!
1

You can use as.formula

 f <- as.formula( paste("True", paste0('ABC', i), sep = " ~ ")) reg = lm(f, data = dt[(1:(0.8*nrow(dt))),]) 

1 Comment

Did you try running her code with that modification?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.