5

I would like to parse a sentence where some strings may be unquoted, 'quoted' or "quoted". The code below almost works - but it fails to match closing quotes. I'm guessing this is because of the qq reference. A modification is commented in the code, the modification reults in "quoted' or 'quoted" also parsing and helps show the original problem is with the closing quote. The code also describes the exact grammar.

To be completely clear: unquoted strings parse. A quoted string like 'hello' will parse the open quote ', all the characters hello, but then fail to parse the final quote '.

I made another attempt, similar the begin/end tag matching in the boost tutorials, but without success.

template <typename Iterator> struct test_parser : qi::grammar<Iterator, dectest::Test(), ascii::space_type> { test_parser() : test_parser::base_type(test, "test") { using qi::fail; using qi::on_error; using qi::lit; using qi::lexeme; using ascii::char_; using qi::repeat; using namespace qi::labels; using boost::phoenix::construct; using boost::phoenix::at_c; using boost::phoenix::push_back; using boost::phoenix::val; using boost::phoenix::ref; using qi::space; char qq; arrow = lit("->"); open_quote = (char_('\'') | char_('"')) [ref(qq) = _1]; // Remember what the opening quote was close_quote = lit(val(qq)); // Close must match the open // close_quote = (char_('\'') | char_('"')); // Enable this line to get code 'almost' working quoted_string = open_quote >> +ascii::alnum >> close_quote; unquoted_string %= +ascii::alnum; any_string %= (quoted_string | unquoted_string); test = unquoted_string [at_c<0>(_val) = _1] > unquoted_string [at_c<1>(_val) = _1] > repeat(1,3)[any_string] [at_c<2>(_val) = _1] > arrow > any_string [at_c<3>(_val) = _1] ; // .. <snip>set rule names on_error<fail>(/* <snip> */); // debug rules } qi::rule<Iterator> arrow; qi::rule<Iterator> open_quote; qi::rule<Iterator> close_quote; qi::rule<Iterator, std::string()> quoted_string; qi::rule<Iterator, std::string()> unquoted_string; qi::rule<Iterator, std::string()> any_string; // A quoted or unquoted string qi::rule<Iterator, dectest::Test(), ascii::space_type> test; }; // main() // This example should fail at the very end // (ie not parse "str3' because of the mismatched quote // However, it fails to parse the closing quote of str1 typedef boost::tuple<string, string, vector<string>, string> DataT; DataT data; std::string str("addx001 add 'str1' \"str2\" -> \"str3'"); std::string::const_iterator iter = str.begin(); const std::string::const_iterator end = str.end(); bool r = phrase_parse(iter, end, grammar, boost::spirit::ascii::space, data); 

For bonus credit: A solution that avoid a local data member (such as char qq in above example) would be preferred, but from a practical point of view I'll use anything that works!

3
  • For the record, making char qq a member variable of struct test_parser fails in exactly the same way. Commented Apr 24, 2012 at 0:11
  • Fails in what "same way?" You haven't told us how this one fails (though I can image it is due to the qq reference). Commented Apr 24, 2012 at 0:16
  • @NicolBolas It was a comment in the code - I've since clarified the question, thank for pointing out. I also suspect the ref(qq), but the downside of boost lambda&co is they are tricky to debug as you can't step through in the traditional sense! Commented Apr 24, 2012 at 0:30

1 Answer 1

12

The reference to qq becomes dangling after leaving the constructor, so that is indeed a problem.

qi::locals is the canonical way to keep local state inside parser expressions. Your other option would be to extend the lifetime of qq (by making it a member of the grammar class, e.g.). Lastly, you might be interested in inherited attributes as well. This mechanism gives you a way to call a rule/grammar with 'parameters' (passing local state around).

NOTE There are caveats with the use of the kleene operator +: it is greedy, and parsing fails if the string is not terminated with the expected quote.

See another answer I wrote for more complete examples of treating arbitrary contents in (optionally/partially) quoted strings, that allow escaping of quotes inside quoted strings and more things like that:

I've reduced the grammar to the relevant bit, and included a few test cases:

#include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix.hpp> #include <boost/fusion/adapted.hpp> namespace qi = boost::spirit::qi; template <typename Iterator> struct test_parser : qi::grammar<Iterator, std::string(), qi::space_type, qi::locals<char> > { test_parser() : test_parser::base_type(any_string, "test") { using namespace qi; quoted_string = omit [ char_("'\"") [_a =_1] ] >> no_skip [ *(char_ - char_(_a)) ] >> lit(_a) ; any_string = quoted_string | +qi::alnum; } qi::rule<Iterator, std::string(), qi::space_type, qi::locals<char> > quoted_string, any_string; }; int main() { test_parser<std::string::const_iterator> grammar; const char* strs[] = { "\"str1\"", "'str2'", "'str3' trailing ok", "'st\"r4' embedded also ok", "str5", "str6'", NULL }; for (const char** it = strs; *it; ++it) { const std::string str(*it); std::string::const_iterator iter = str.begin(); std::string::const_iterator end = str.end(); std::string data; bool r = phrase_parse(iter, end, grammar, qi::space, data); if (r) std::cout << "Parsed: " << str << " --> " << data << "\n"; if (iter!=end) std::cout << "Remaining: " << std::string(iter,end) << "\n"; } } 

Output:

Parsed: "str1" --> str1 Parsed: 'str2' --> str2 Parsed: 'str3' trailing ok --> str3 Remaining: trailing ok Parsed: 'st"r4' embedded also ok --> st"r4 Remaining: embedded also ok Parsed: str5 --> str5 Parsed: str6' --> str6 Remaining: ' 
Sign up to request clarification or add additional context in comments.

6 Comments

Thanks, this is exactly what I was after. Would you be able to post a link to any documentation/examples about the locals, it took me a while to notice the qi::local<char> in the rule signature, and it would be a good reference for me and anyone else looking at this question.
@Zero thanks! And, erm qi::locals was a hyperlink in my answer :) - click it for documentation
@Zero For a good sample, I'd refer to the page you linked to in your question, notably here: One More Take
Aha, got it - at the bottom of One More Take they talk about the 'locals' template parameter. Thanks again.
Slightly improved parsing of string literal (accepting any text within the quotes). Now also with fixed test
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.