0

gsub is a function which allows us to extract and replace patterns in strings but I'm having a hard time trying to understand its underlying logic. For example, I want to extract the last part of these strings (extension):

files = c( "tmp-project.csv", "project.csv", "project2-csv-specs.csv", "project2.csv2.specs.xlsx", "project_cars.ods", "project-houses.csv", "Project_Trees.csv","project-cars.R", "project-houses.r", "project-final.xls", "Project-final2.xlsx" ) gsub("\\.[a-zA-Z]*$", "\\1" ,files) 

What I get is anything but the string I want.

 [1] "tmp-project" "project" "project2-csv-specs" [4] "project2.csv2.specs" "project_cars" "project-houses" [7] "Project_Trees" "project-cars" "project-houses" [10] "project-final" "Project-final2" 

What am I doing wrong and what's the logic of gsub? I know there is stringr package to handle this kind of problems in an easy way but I'm looking for an R base solution. Thank you.

2
  • 4
    You use \1 as the replacement pattern, but have not defined any capturing group in the pattern. gsub replaces matches. To extract with gsub, you need to match the whole string and capture what you need to keep. So, gsub(".*\\.([a-zA-Z]*)$", "\\1" ,files) Commented Feb 13, 2020 at 21:32
  • This definetively answers all my questions. Thanks a lot, I had a very bad time struggling with gsub. Commented Feb 13, 2020 at 21:37

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.