gsub is a function which allows us to extract and replace patterns in strings but I'm having a hard time trying to understand its underlying logic. For example, I want to extract the last part of these strings (extension):
files = c( "tmp-project.csv", "project.csv", "project2-csv-specs.csv", "project2.csv2.specs.xlsx", "project_cars.ods", "project-houses.csv", "Project_Trees.csv","project-cars.R", "project-houses.r", "project-final.xls", "Project-final2.xlsx" ) gsub("\\.[a-zA-Z]*$", "\\1" ,files) What I get is anything but the string I want.
[1] "tmp-project" "project" "project2-csv-specs" [4] "project2.csv2.specs" "project_cars" "project-houses" [7] "Project_Trees" "project-cars" "project-houses" [10] "project-final" "Project-final2" What am I doing wrong and what's the logic of gsub? I know there is stringr package to handle this kind of problems in an easy way but I'm looking for an R base solution. Thank you.
\1as the replacement pattern, but have not defined any capturing group in the pattern.gsubreplaces matches. To extract withgsub, you need to match the whole string and capture what you need to keep. So,gsub(".*\\.([a-zA-Z]*)$", "\\1" ,files)