1

I am working in R for the first time and I have been having difficulty renaming column names in a dataframe (Grade.Data). I have a dataset imported from an csv file that has column names like this: Student.ID

Grade Interactive.Exercises.1..Health Interactive.Exercises.2..Fitness Quizzes.1..Week.1.Quiz Quizzes.2..Week.2.Quiz Case.Studies.1..Case.Study1 Case.Studies.2..Case.Study2 

I would like to be able to change the variable names so that they are more simple, i.e. from Interactive.Exercises.1.Health to Interactive.Exercises.1 or Quizzes.1.Week.1.Quiz to Quizzes.1

So far, I have tried this:

grep(".*[0-9]", names(Grade.Data)) 

But I get this returned:

[1] 3 4 5 6 7 8 9 11 12 13 14 15 16 17 19 20 21 22 23 24 25 

Can anyone help me figure out what is going on, and write a better regex expression? Thank you so much.

2
  • I think you want names(Grade.Data) <- sub("^(.*[^.])\\..*$", "\\1", names(Grade.Data)). What about Case.Studies.2..Case.Study2, what is the expected output? Also, try "^(.*[^.])\\.{2}.*" pattern. Commented Aug 14, 2017 at 18:38
  • I would like to have Case.Studies.2..Case.Study2 change to Case.Studies2, from the first half of the string. Commented Aug 14, 2017 at 18:56

3 Answers 3

2

It seems you truncate column names after the first chunk of digits.

You may use the following sub solution:

names(Grade.Data) <- sub("^(.*?\\d+).*$", "\\1", names(Grade.Data)) 

See the regex demo

Details

  • ^ - start of string
  • (.*?\\d+) - Group 1 (later referred with \1 from the replacement pattern) matching any 0+ chars as few as possible (.*?) and then 1 or more digits (\d+)
  • .* - any 0+ chars as many as possible
  • $ - end of string
Sign up to request clarification or add additional context in comments.

1 Comment

This worked perfectly! Thank you so much. Thank you everyone for your help as well.
0

There is nothing wrong with your regex itself. What you are looking for is probably the combination of regexpr - which gets the start and ending of your regex- and regmatches - which gets the actual string corresponding to the output of regexpr:

start_end <- regexpr(".*[0-9]", names(Grade.data)) regmatches(names(Grade.data), start_end) # [1] "Interactive.Exercises.1" "Interactive.Exercises.2" # [3] "Quizzes.1..Week.1" "Quizzes.2..Week.2" # [5] "Case.Studies.1..Case.Study1" 

Adding a question-mark behind the dot-star will make the regex match as few characters as possible, so it will stop after the first numeric value:

start_end <- regexpr(".*?[0-9]", names(Grade.data)) regmatches(names(Grade.data), start_end) # [1] "Interactive.Exercises.1" "Interactive.Exercises.2" # [3] "Quizzes.1" "Quizzes.2" # [5] "Case.Studies.1" 

1 Comment

This works, but how would I change the actual names of the columns? I tried names(Grade.Data) <-start_end but it changed the column names to -1 1 1 1 1 -1 and so forth.
0

you should use the function names, following I write a little example, the names string can be as long as you need.

names(x = Grade.Data) <- c("Col1_name", "Col2_name") 

1 Comment

Except, I have so many variables, I don't want to rename each one individually. I was hoping to find something that would work with what I have already and just keep the selection I wanted.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.