How can I extract the extension of a file given a file path as a character? I know I can do this via regular expression regexpr("\\.([[:alnum:]]+)$", x), but wondering if there's a built-in function to deal with this?
9 Answers
This is the sort of thing that easily found with R basic tools. E.g.: ??path.
Anyway, load the tools package and read ?file_ext .
9 Comments
??"extensions" although one would have expected that it would.sos does a full text search. ?? only searches metadata (title, keywords, etc.) Furthermore, it's not that hard to skim the results. (I tried findFn("{file extension}"), "extract {file extension}", and "{extract file extension}", the first was best.)Let me extend a little bit great answer from https://stackoverflow.com/users/680068/zx8754
Here is the simple code snippet
# 1. Load library 'tools' library("tools") # 2. Get extension for file 'test.txt' file_ext("test.txt") The result should be 'txt'.
4 Comments
library(tools) when you can simply use tools::file_ext, such as in tools::file_ext("test.txt").simple function with no package to load :
getExtension <- function(file){ ex <- strsplit(basename(file), split="\\.")[[1]] return(ex[-1]) } 1 Comment
getExtension <- function(file) strsplit(file, ".", fixed=T)[[1]][-1]. To avoid regex and increase performance fixed = TRUE can be used.The regexpr above fails if the extension contains non-alnum (see e.g. https://en.wikipedia.org/wiki/List_of_filename_extensions) As an altenative one may use the following function:
getFileNameExtension <- function (fn) { # remove a path splitted <- strsplit(x=fn, split='/')[[1]] # or use .Platform$file.sep in stead of '/' fn <- splitted [length(splitted)] ext <- '' splitted <- strsplit(x=fn, split='\\.')[[1]] l <-length (splitted) if (l > 1 && sum(splitted[1:(l-1)] != '')) ext <-splitted [l] # the extention must be the suffix of a non-empty name ext }
3 Comments
basename and dirname obviate some of the work hereA way would be to use sub.
s <- c("test.txt", "file.zi_", "noExtension", "with.two.ext2", "file.with.final.dot.", "..", ".", "") sub(".*\\.|.*", "", s, perl=TRUE) #[1] "txt" "zi_" "" "ext2" "" "" "" "" Assuming there is a dot - which will fail in case there is no extension:
sub(".*\\.", "", s) #[1] "txt" "zi_" "noExtension" "ext2" "" #[6] "" "" "" For comparison tools::file_ext(s) and the code with inside used regex.
tools::file_ext(s) #[1] "txt" "" "" "ext2" "" "" "" "" pos <- regexpr("\\.([[:alnum:]]+)$", s) ifelse(pos > -1L, substring(s, pos + 1L), "") #[1] "txt" "" "" "ext2" "" "" "" "" Comments
If you don't want to use any additional package you could try
file_extension <- function(filenames) { sub(pattern = "^(.*\\.|[^.]+)(?=[^.]*)", replacement = "", filenames, perl = TRUE) } If you like to be cryptic you could try to use it as a one-line expression: sub("^(.*\\.|[^.]+)(?=[^.]*)", "", filenames, perl = TRUE) ;-)
It works for zero (!), one or more file names (as character vector or list) with an arbitrary number of dots ., and also for file names without any extension where it returns the empty character "".
Here the tests I tried:
> file_extension("simple.txt") [1] "txt" > file_extension(c("no extension", "simple.ext1", "with.two.ext2", "some.awkward.file.name.with.a.final.dot.", "..", ".", "")) [1] "" "ext1" "ext2" "" "" "" "" > file_extension(list("file.ext1", "one.more.file.ext2")) [1] "ext1" "ext2" > file_extension(NULL) character(0) > file_extension(c()) character(0) > file_extension(list()) character(0) By the way, tools::file_ext() has trouble finding "strange" extensions with non-alphanumeric characters:
> tools::file_ext("file.zi_") [1] "" Comments
This function uses pipes:
library(magrittr) file_ext <- function(f_name) { f_name %>% strsplit(".", fixed = TRUE) %>% unlist %>% extract(2) } file_ext("test.txt") # [1] "txt" 3 Comments
tools::file_ext?tools functiontools::file_ext works fine.