31

I'm just getting my head around regular expressions, and I'm using the Boost Regex library.

I have a need to use a regex that includes a specific URL, and it chokes because obviously there are characters in the URL that are reserved for regex and need to be escaped.

Is there any function or method in the Boost library to escape a string for this kind of usage? I know there are such methods in most other regex implementations, but I don't see one in Boost.

Alternatively, is there a list of all characters that would need to be escaped?

4 Answers 4

42
. ^ $ | ( ) [ ] { } * + ? \ 

Ironically, you could use a regex to escape your URL so that it can be inserted into a regex.

const boost::regex esc("[.^$|()\\[\\]{}*+?\\\\]"); const std::string rep("\\\\&"); std::string result = regex_replace(url_to_escape, esc, rep, boost::match_default | boost::format_sed); 

(The flag boost::format_sed specifies to use the replacement string format of sed. In sed, an escape & will output whatever matched by the whole expression)

Or if you are not comfortable with sed's replacement string format, just change the flag to boost::format_perl, and you can use the familiar $& to refer to whatever matched by the whole expression.

const std::string rep("\\\\$&"); std::string result = regex_replace(url_to_escape, esc, rep, boost::match_default | boost::format_perl); 
Sign up to request clarification or add additional context in comments.

4 Comments

I tried using a regex to do it, but I'm still fairly incompetent, and strange things were occuring :p I've ordered a couple of books on regex today so hopefully my ignorance will be short lived! In the meantime, using a regular string replacement to escape these characters worked for my immediate needs, thanks.
I added some code to my answer that I think should work to add a backslash before any character that needs to be escaped. I haven't used boost in a while though so no guarantees.
It was close, just had to add a "&" to the end of rep and it worked. Thanks.
Btw, Since C++11 we could also use std::regex. Unfortunately, GCC4.8 has many regex bugs. And indeed, even with GCC7, the SED expression does not work correctly. This was fixed for GCC8: gcc.gnu.org/bugzilla/show_bug.cgi?id=83601
14

Using code from Dav (+ a fix from comments), I created ASCII/Unicode function regex_escape():

std::wstring regex_escape(const std::wstring& string_to_escape) { static const boost::wregex re_boostRegexEscape( _T("[.^$|()\\[\\]{}*+?\\\\]") ); const std::wstring rep( _T("\\\\&") ); std::wstring result = regex_replace(string_to_escape, re_boostRegexEscape, rep, boost::match_default | boost::format_sed); return result; } 

For ASCII version, use std::string/boost::regex instead of std::wstring/boost::wregex.

Comments

4

Same with boost::xpressive:

const boost::xpressive::sregex re_escape_text = boost::xpressive::sregex::compile("([\\^\\.\\$\\|\\(\\)\\[\\]\\*\\+\\?\\/\\\\])"); std::string regex_escape(std::string text){ text = boost::xpressive::regex_replace( text, re_escape_text, std::string("\\$1") ); return text; } 

Comments

1

In C++11, you can use raw string literals to avoid escaping the regex string:

std::string myRegex = R"(something\.com)";

See http://en.cppreference.com/w/cpp/language/string_literal, item (6).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.