4

Ok I'm giving up and ask the question after I read through the help article of regex and still don't have a clue what I'm looking for:

I Have a list of files:

files <- c("files_combined.csv","file_1-10.csv","file_11-20.csv", "file_21-30.csv","file_2731-2740.csv","file_2731-2740.txt") 

I want only the csv files that start with "file_" and end with ".csv". I know the it looks something like this:

grep(pattern = "^file_???.csv$" ,files) 

But I need to find the correct regular expression that ignores the number of characters between the first and the second pattern ("file_" + ".csv"). I'd really appreciate if somebody knows a complete list with the regular expressions in R since it is tedious to read through the help every time and, as in my case not successful, sometimes...

6
  • 2
    Something like "^file_.*\\.csv$" maybe? Commented Jan 4, 2016 at 16:49
  • 2
    Does grep("^file_.+\\.csv$",files,value=T) do what you want? Commented Jan 4, 2016 at 16:49
  • For future reference, I have this regex cheatsheet printed out and pasted on my office wall (site seem to be down for now, there are other versions I'm sure). I know regex seems intimidating at first... and the R docs are notoriously unedifying (still waiting on news about SO dox...). Perhaps you should also try poking around in some R regex questions, see here. Commented Jan 4, 2016 at 16:59
  • 1
    There is a side called "txt2re" which can create a regex pattern from a given String. Maybe this can help you in the future! Commented Jan 4, 2016 at 17:05
  • 1
    @MichaelChirico There's a pdf version of that cheatsheet accessible through the Wayback Machine: web.archive.org/web/20111024203537/http://www.cheatography.com/… Commented Jan 4, 2016 at 19:21

2 Answers 2

5

R offers a function for doing wildcard expansion using glob patterns for those who don't like regex:

files <- Sys.glob("file_*.csv") 

This should match your pattern.

Sign up to request clarification or add additional context in comments.

6 Comments

That was my first guess as well. But it seems R has a different logic here. It's not working for me at least...
Well the documentation confirms that all systems should interpret '*' as "match zero or more characters", so it should match anything starting with "file_" and ending in ".csv", as required (files in current working directory).
So this is what happens: > files <-c("files_combined.csv","file_1-10.csv","file_11-20.csv", + "file_21-30.csv","file_2731-2740.csv","file_2731-2740.txt") > files <- Sys.glob("file_*.csv") > files character(0) compared to the other answer > grep("^file_.+\\.csv$",files,ignore.case = T) [1] 2 3 4 5
Okay, possibly stupid question here. Do you actually have such files in your current working directory? Your first line creates a character vector; your second line (the Sys.glob one) actually looks in your current working directory for files that match that pattern. Does list.files() return a vector that includes the desired files?
I see now where my mistake was. The combination of "file_*.csv" does not work with grep or list.files. That's why I couldn't replicate it on my example character vector. But you are 100% right when it comes to finding the actual files I was looking for (which fit the pattern "mesa_*.csv" the other one was just the example...)! It seems that Sys.glob uses wildcards while the other functions in R employ regular expressions (I wasn't aware of the difference so far).
|
2

Thanks a lot! Seems David Arenburg and Heroka, you came up with the solution at the same time. Also thanks to MichaelChirico for providing the cheatsheet.

This is the answer to my specific problem:

grep("^file_.+\\.csv$",files,ignore.case = T) 

As for problems with regex, this is helpful as well txt2re

4 Comments

This is not an answer.
Heroka, you gave the answer but I cannot accept it as you posted it as a comment. I would like to close this...
@Idos The answer has been improved and now answers the question.
@JonGrub I don't really mind, it would be literallly the same answer as this. Close it by accepting your own answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.