9

I'm trying to write a function in R that drops columns from a data frame and returns the new data with a name specified as an argument of the function:

drop <- function(my.data,col,new.data) { new.data <<- my.data[,-col] return(new.data) } 

So in the above example, I want a new data frame to exist after the function is called that is named whatever the user inputs as the third argument.

When I call the function the correct data frame is returned, but then if I then try to use the new data frame in the global environment I get object not found. I thought by using the <<- operator I was defining new.data globally.

Can someone help me understand what's going on and if there is a way to accomplish this?

I found this and this that seemed related, but neither quite answered my question.

4
  • You could assign(new.data, mydata[,-col], envir = .GlobalEnv) although I would recommend against this whole idea Commented Mar 14, 2014 at 18:12
  • It looks like your function requires more typing than explicitly doing the call directly. What is the point? Also assigning things using <<- from within a function is terrible practice. Commented Mar 14, 2014 at 18:13
  • 1
    You are trying to write a function with a side effect. R is a functional language and thus functions shouldn't have side effects. Commented Mar 14, 2014 at 18:16
  • @Dason ah good to know that <<- should'nt be used in a function - thanks. My actual function is longer than this I was just using this as an easy example. It does save a lot of typing. Commented Mar 14, 2014 at 18:18

2 Answers 2

17

Use the assign() function.

 assign("new.data", my.data[,-col], envir = .GlobalEnv) 

The first argument should be a string. In this case, the resultant global variable will be named "new.data". If new.data is the name itself, drop the quotes from the function call.

<<- does not always assign to the global environment.

In general, however, it is better to return things from a function than set global variables from inside a function. The latter is a lot harder to debug.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks this is helpful. I'll live with slightly more typing and just use return.
@JakeBurkhead: I expanded to explain when to quote and when not to.
Is there a systematic way to do this for a pile of variables?
0

One reason to need this is when working a great deal with the RStudio console to perform lots of text mining. For example, if you have a large corpus and you want to break it up into sub-corpi based on themes, performing the processing as a function and returning a cleaned corpus can be much faster. An example is below:

 processText <- function(inputText, corpName){ outputName <- Corpus(VectorSource(inputText)) outputName <- tm_map(outputName,PlainTextDocument) outputName <- tm_map(outputName, removeWords, stopwords("english")) outputName <- tm_map(outputName, removePunctuation) outputName <- tm_map(outputName, removeNumbers) outputName <- tm_map(outputName, stripWhitespace) assign(corpName, outputName, envir = .GlobalEnv) return(corpName) } 

In the case above, I enter the column from the data frame as the inputText and the desired output corpus as corpName. This allows the simple task of the following to process a bunch of text data:

processText(retail$Essay,"retailCorp") 

Then the new corpus "retailCorp" shows up in the global environment for further work such as plotting word clouds, etc. Also, I can send lists through the function and get lots of corpi back.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.