Rename Dataframe Column Names in R using Previous Column Name and Regex Pattern

Question

I am working in R for the first time and I have been having difficulty renaming column names in a dataframe (Grade.Data). I have a dataset imported from an csv file that has column names like this: Student.ID

Grade Interactive.Exercises.1..Health Interactive.Exercises.2..Fitness Quizzes.1..Week.1.Quiz Quizzes.2..Week.2.Quiz Case.Studies.1..Case.Study1 Case.Studies.2..Case.Study2

I would like to be able to change the variable names so that they are more simple, i.e. from Interactive.Exercises.1.Health to Interactive.Exercises.1 or Quizzes.1.Week.1.Quiz to Quizzes.1

So far, I have tried this:

grep(".*[0-9]", names(Grade.Data))

But I get this returned:

[1] 3 4 5 6 7 8 9 11 12 13 14 15 16 17 19 20 21 22 23 24 25

Can anyone help me figure out what is going on, and write a better regex expression? Thank you so much.

I think you want names(Grade.Data) <- sub("^(.*[^.])\\..*$", "\\1", names(Grade.Data)). What about Case.Studies.2..Case.Study2, what is the expected output? Also, try "^(.*[^.])\\.{2}.*" pattern. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented Aug 14, 2017 at 18:38
I would like to have Case.Studies.2..Case.Study2 change to Case.Studies2, from the first half of the string. — Syrah.Sharpe
– Syrah.Sharpe, Commented Aug 14, 2017 at 18:56

Wiktor Stribiżew · Accepted Answer · 2017-08-14 19:09:08Z

It seems you truncate column names after the first chunk of digits.

You may use the following sub solution:

names(Grade.Data) <- sub("^(.*?\\d+).*$", "\\1", names(Grade.Data))

See the regex demo

Details

^ - start of string
(.*?\\d+) - Group 1 (later referred with \1 from the replacement pattern) matching any 0+ chars as few as possible (.*?) and then 1 or more digits (\d+)
.* - any 0+ chars as many as possible
$ - end of string

This worked perfectly! Thank you so much. Thank you everyone for your help as well.

KenHBS · Accepted Answer · 2017-08-14 18:13:28Z

There is nothing wrong with your regex itself. What you are looking for is probably the combination of regexpr - which gets the start and ending of your regex- and regmatches - which gets the actual string corresponding to the output of regexpr:

start_end <- regexpr(".*[0-9]", names(Grade.data)) regmatches(names(Grade.data), start_end) # [1] "Interactive.Exercises.1" "Interactive.Exercises.2" # [3] "Quizzes.1..Week.1" "Quizzes.2..Week.2" # [5] "Case.Studies.1..Case.Study1"

Adding a question-mark behind the dot-star will make the regex match as few characters as possible, so it will stop after the first numeric value:

start_end <- regexpr(".*?[0-9]", names(Grade.data)) regmatches(names(Grade.data), start_end) # [1] "Interactive.Exercises.1" "Interactive.Exercises.2" # [3] "Quizzes.1" "Quizzes.2" # [5] "Case.Studies.1"

This works, but how would I change the actual names of the columns? I tried names(Grade.Data) <-start_end but it changed the column names to -1 1 1 1 1 -1 and so forth.

Manuel Sánchez Mendoza · Accepted Answer · 2017-08-14 18:28:56Z

0

you should use the function names, following I write a little example, the names string can be as long as you need.

names(x = Grade.Data) <- c("Col1_name", "Col2_name")

answered Aug 14, 2017 at 18:28

Manuel Sánchez Mendoza

1963 silver badges8 bronze badges

1 Comment

Syrah.Sharpe Over a year ago

Except, I have so many variables, I don't want to rename each one individually. I was hoping to find something that would work with what I have already and just keep the selection I wanted.

Collectives™ on Stack Overflow

Rename Dataframe Column Names in R using Previous Column Name and Regex Pattern

3 Answers 3

1 Comment

1 Comment

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

1 Comment

Linked

Related