variable scope & resolution in R function

Question

I want to loop through the vars in a dataframe, calling lm() on each one, and so I wrote this:

findvars <- function(x = samsungData, dv = 'activity', id = 'subject') { # Loops through the possible predictor vars, does an lm() predicting the dv # from each, and returns a data.frame of coefficients, one row per IV. r <- data.frame() # All varnames apart from the dependent var, and the case identifier ivs <- setdiff(names(x), c(dv, id)) for (iv in ivs) { print(paste("trying", iv)) m <- lm(dv ~ iv, data = x, na.rm = TRUE) # Take the absolute value of the coefficient, then transpose. c <- t(as.data.frame(sapply(m$coefficients, abs))) c$iv <- iv # which IV produced this row? r <- c(r, c) } return(r) }

This doesn't work, I believe b/c the formula in the lm() call consists of function-local variables that hold strings naming vars in the passed-in dataframe (e.g., "my_dependant_var" and "this_iv") as opposed to pointers to the actual variable objects.

I tried wrapping that formula in eval(parse(text = )), but could not get that to work.

If I'm right about the problem, can someone explain to me how to get R to resolve the contents of those vars iv & dv into the pointers I need? Or if I'm wrong, can someone explain what else is going on?

Many thanks!

Here is some repro code:

library(datasets) data(USJudgeRatings) findvars(x = USJudgeRatings, dv = 'CONT', id = 'DILG')

In R you should forget about "pointers to the objects". Values are passed ... as values. And the "variables" in formulas are not really "strings". Names and symbols are objects of super-class "language". Character vectors are not. — IRTFM
– IRTFM, Commented Dec 4, 2013 at 5:45

joran · Accepted Answer · 2013-12-04 04:29:58Z

So there's enough bad stuff happening in your function besides your trouble with the formula, that I think someone should walk you through it all. Here are some annotations, followed by a better version:

 #For small examples, "growing" objects isn't a huge deal, # but you will regret it very, very quickly. It's a bad # habit. Learn to ditch it now. So don't inititalize # empty lists and data frames. r <- data.frame() ivs <- setdiff(names(x), c(dv, id)) for (iv in ivs) { print(paste("trying", iv)) #There is no na.rm argument to lm, only na.action m <- lm(dv ~ iv, data = x, na.rm = TRUE) #Best not to name variables c, its a common function, see two lines from now! # Also, use the coef() extractor functions, not $. That way, if/when # authors change the object structure your code won't break. #Finally, abs is vectorized, no need for sapply c <- t(as.data.frame(sapply(m$coefficients, abs))) #This is probably best stored in the name c$iv <- iv # which IV produced this row? #Growing objects == bad! Also, are you sure you know what happens when # you concatenate two data frames? r <- c(r, c) } return(r) }

Try something like this instead:

findvars <- function(x,dv,id){ ivs <- setdiff(names(x),c(dv,id)) #initialize result list of the appropriate length result <- setNames(vector("list",length(ivs)),ivs) for (i in seq_along(ivs)){ result[[i]] <- abs(coef(lm(paste(dv,ivs[i],sep = "~"),data = x,na.action = na.omit))) } result }

Holy cow--thank you so much for breaking all that that down! That of course works, and I have some inkling why it works--at least I think so. I'm comforted to know that lm() will take a string for the formula argument--I guess part of my problem was wrapping my paste() call in eval()? Or something. At any rate--thanks for being so generous w/your time!

Collectives™ on Stack Overflow

variable scope & resolution in R function

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related