18

I want to update one column of a dataframe, referencing it using its original name, is this possible? For example say I had the table 'data'

a b c 1 2 2 3 2 3 4 1 2 

and I wanted to update the name of column b to 'd'. I know I could use

colnames(data)[2] <- 'd' 

but can I make the change by specifically referencing b, i.e. something like

colnames(data)['b'] <- 'd' 

so that if the column ordering of the dataframe changes the correct column name will still be updated.

Thanks in advance

1
  • 3
    Good question! Was trying this: colnames(data['b']) <- 'd', also not good! As Chase points out, this is the way: colnames(data)[colnames(data) == "b"] <- "d" Commented Apr 5, 2014 at 14:15

5 Answers 5

30

There is a function setnames built into package data.table for exactly that.

setnames(DT, "b", "d") 

It changes the names by reference with no copy at all. Any other method using names(data)<- or names(data)[i]<- or similar will copy the entire object, usually several times. Even though all you're doing is changing a column name.

DT must be type data.table for setnames to work, though. So you'd need to switch to data.table or convert using as.data.table, to use it.

Here is the extract from ?setnames. The intention is that you run example(setnames) at the prompt and then the comments relate to the copies you see being reported by tracemem.

DF = data.frame(a=1:2,b=3:4) # base data.frame to demo copies tracemem(DF) colnames(DF)[1] <- "A" # 4 copies of entire object names(DF)[1] <- "A" # 3 copies of entire object names(DF) <- c("A", "b") # 2 copies of entire object `names<-`(DF,c("A","b")) # 1 copy of entire object x=`names<-`(DF,c("A","b")) # still 1 copy (so not print method) # What if DF is large, say 10GB in RAM. Copy 10GB just to change a column name? DT = data.table(a=1:2,b=3:4,c=5:6) tracemem(DT) setnames(DT,"b","B") # by name; no match() needed. No copy. setnames(DT,3,"C") # by position. No copy. setnames(DT,2:3,c("D","E")) # multiple. No copy. setnames(DT,c("a","E"),c("A","F")) # multiple by name. No copy. setnames(DT,c("X","Y","Z")) # replace all. No copy. 
Sign up to request clarification or add additional context in comments.

19 Comments

But is loading of new package worth all the hustle for the sake of simple column renaming? =)
Absolutely. It can make the difference between out of memory, or not. And it's shorter, easier and slightly less chance of bugs.
@Tyler There are two (rather long) threads on r-devel about this: speeding up perception and (perhaps most relevant) confused about NAMED and probably others.
@Tyler Now on these benchmarks that show data.table is slower, can you point me to just one please?
@MatthewDowle -- Just added one more tracemem test to your example, just b/c it's kind of hilarious how variable R's behavior is, and b/c I kind of like the count down of 4, 3, 2, 1, ... data.table .
|
16

As of October 2014 this can now be done easily in the dplyr package:

rename(data, d = b) 

Comments

13

This seems like a hack, but the first thing that came to mind was to use grepl() with a sufficiently detailed enough search string to only get the column you want. I'm sure there are better options:

dat <- data.frame(a = 1:3, b = 1:3, c = 1:3) colnames(dat)[grepl("b", colnames(dat))] <- "foo" dat #------ a foo c 1 1 1 1 2 2 2 2 3 3 3 3 

As Joran points out below, I overcomplicated things...no need for a regex at all. This saves a few characters on the typing too.

colnames(dat)[colnames(dat) == "foo"] <- "bar" #------ a bar c 1 1 1 1 2 2 2 2 3 3 3 3 

4 Comments

Or you could simply index the column names using colnames(dat) == 'b', but its going to be circular no matter what you do.
Don't use regexes for simple stuff like this. I'd rather stick with simple == relational operator.
I thought on first glance that Chase used agrep which could have some advantages.
@aL3xa, if you have many similar column prefixes/suffixes to rename, gsub is invaluable. But yeah one isolated case is generally overkill.
4

Yes but it's more difficult (as far as I know) than numeric indexing. I'm going to provide a dirty function that will do this and if you want to see how to do it just tear the function apart line by line:

rename <- function(df, column, new){ x <- names(df) #Did this to avoid typing twice if (is.numeric(column)) column <- x[column] #Take numeric input by indexing names(df)[x %in% column] <- new #What you're interested in return(df) } #try it out rename(mtcars, 'mpg', 'NEW') rename(mtcars, 1, 'NEW') 

Comments

1

I disagree with @Chase - the grepl solution ain't the luckiest one. I'd say: go with simple ==. Here's why:

d <- data.frame(matrix(rnorm(100), 10)) colnames(d) <- replicate(10, paste(sample(letters[1:5], size = 5, replace=TRUE, prob=c(.1, .6, .1, .1, .1)), collapse = "")) 

Now try doing grepl("b", colnames(d)). Either pass fixed = TRUE, or even better do simple colnames(d) == "b" like @joran suggested. Regex matching will always be slower than ==, so for simple tasks like this you may want to use simple ==.

2 Comments

I think I pointed out in my answer that I was sure there are better answers, specifically the part I'm sure there are better options. As Joran pointed out in the comments, directly using == is better, which I recognize and show an example of in my answer now too :) I'll leave the top half for posterity's sake.
This answer is essentially the same as mine in that I use colnames(d) %in% "b". In this case they're doing the same thing, though I suppose the == will be faster.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.