10

I'm string to create a std::regex(__FILE__) as part of a unit test which checks some exception output that prints the file name.

On Windows it fails with:

regex_error(error_escape): The expression contained an invalid escaped character, or a trailing escape.

because the __FILE__ macro expansion contains un-escaped backslashes.

Is there a more elegant way to escape the backslashes than to loop through the resulting string (i.e. with a std algorithm or some std::string function)?

25
  • __FILE__ should only print the filename. do you need the full path? Commented Aug 30, 2016 at 13:34
  • 2
    @Hayt "__FILE__ should only print the filename." Not necessarily Commented Aug 30, 2016 at 13:35
  • yeah if he does not need them he can look that up here: msdn.microsoft.com/en-us/library/027c4t2s.aspx assuming the problem is not the missing quotation marks, which you have already answered. And assuming he uses MSVC compiler Commented Aug 30, 2016 at 13:37
  • 1
    @NicolasHolthaus Maybe std::transform() plus a lambda function could be helpful to write it in an elegant way. Commented Aug 30, 2016 at 13:47
  • 1
    maybe it's just best then to write your own function then which goes through the string char by char and copies it and when it finds a \ add another one. Commented Aug 30, 2016 at 13:56

3 Answers 3

8

File paths can contain many characters that have special meaning in regular expression patterns. Escaping just the backslashes is not enough for robust checking in the general case.

Even a simple path, like C:\Program Files (x86)\Vendor\Product\app.exe, contains several special characters. If you want to turn that into a regular expression (or part of a regular expression), you would need to escape not only the backslashes but also the parentheses and the period (dot).

Fortunately, we can solve our regular expression problem with more regular expressions:

std::string EscapeForRegularExpression(const std::string &s) { static const std::regex metacharacters(R"([\.\^\$\+\(\)\[\]\{\}\|\?\*])"); return std::regex_replace(s, metacharacters, "\\$&"); } 

(File paths can't contain * or ?, but I've included them to keep the function general.)

If you don't abide by the "no raw loops" guideline, a probably faster implementation would avoid regular expressions:

std::string EscapeForRegularExpression(const std::string &s) { static const char metacharacters[] = R"(\.^$+()[]{}|?*)"; std::string out; out.reserve(s.size()); for (auto ch : s) { if (std::strchr(metacharacters, ch)) out.push_back('\\'); out.push_back(ch); } return out; } 

Although the loop adds some clutter, this approach allows us to drop a level of escaping on the definition of metacharacters, which is a readability win over the regex version.

Sign up to request clarification or add additional context in comments.

3 Comments

@Nicolas Holthaus: Sean Parent of Adobe proposes the "no raw loops" idea in this video: channel9.msdn.com/Events/GoingNative/2013/Cpp-Seasoning
Why would you escape the - character? Isn't that a normal character except within character classes ([...])?
@AlexisWilke: That was a mistake on my part. I looked at summary sheets for several regular expression grammars to come up with the superset of special characters and must've misread one of them. I'll fix the answer.
1

Here is polymapper.

It takes an operation that takes and element and returns a range, the "map operation".

It produces a function object that takes a container, and applies the "map operation" to each element. It returns the same type as the container, where each element has been expanded/contracted by the "map operation".

template<class Op> auto polymapper( Op&& op ) { return [op=std::forward<Op>(op)](auto&& r) { using std::begin; using R=std::decay_t<decltype(r)>; using iterator = decltype( begin(r) ); using T = typename std::iterator_traits<iterator>::value_type; std::vector<T> data; for (auto&& e:decltype(r)(r)) { for (auto&& out:op(e)) { data.push_back(out); } } return R{ data.begin(), data.end() }; }; } 

Here is escape_stuff:

auto escape_stuff = polymapper([](char c)->std::vector<char> { if (c != '\\') return {c}; else return {c,c}; }); 

live example.

int main() { std::cout << escape_stuff(std::string(__FILE__)) << "\n"; } 

The advantage of this approach is that the action of messing with the guts of the container is factored out. You write code that messes with the characters or elements, and the overall logic is not your problem.

The disadvantage is polymapper is a bit strange, and needless memory allocations are done. (Those could be optimized out, but that makes the code more convoluted).

Comments

1

EDIT

In the end, I switched to @AdrianMcCarthy 's more robust approach.


Here's the inelegant method in which I solved the problem in case someone stumbles on this actually looking for a workaround:

std::string escapeBackslashes(const std::string& s) { std::string out; for (auto c : s) { out += c; if (c == '\\') out += c; } return out; } 

and then

std::regex(escapeBackslashes(__FILE__)); 

It's O(N) which is probably as good as you can do here, but involves a lot of string copying which I'd like to think isn't strictly necessary.

2 Comments

All this does is escape the backslashes, which is insufficient for transforming a Windows file path into a valid regular expression pattern. It doesn't do anything with other regular expression meta characters that can be in path names, like parentheses.
@AdrianMcCarthy sure, but that's all it was intended to do. It was meant for a unit test, not as a general purpose regex maker, and solved the one and only one problem I needed it to.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.