2

I would like to parse the string

 std::string entry = "127.0.0.1 - [16/Aug/2012:01:50:02 +0000] \"GET /check.htm HTTP/1.1\" 200 17 \"AgentName/0.1 libwww-perl/5.833\"" 

with the followings:

 ip_rule %= lexeme[(+char_("0-9."))[ref(ip) = _1]]; timestamp_rule %= lexeme[('[' >> +(char_ - ']') >> ']')[ref(timestamp) = _1]]; user_rule %= lexeme[(+char_)[ref(user) = _1]]; request_rule %= lexeme[('"' >> +(char_ - '"') >> '"')[ref(req) = _1]]; referer_rule %= lexeme[('"' >> +(char_ - '"') >> '"')[ref(referer) = _1]]; bool r = phrase_parse(first, last, ip_rule >> user_rule >> timestamp_rule >> request_rule >> uint_[ref(status) = _1] >> uint_[ref(transferred_bytes) = _1] >> referer_rule, space); 

but it does not match. If I remove the "-" from the string, and the rule "user_rule" of course, than it matches. Could you please advise how to match the string with the "-"?

1
  • The rules have the following type: rule<Iterator, std::string(), space_type> ip_rule, timestamp_rule, user_rule, request_rule, referer_rule; Commented Aug 23, 2012 at 9:14

1 Answer 1

3

Your user_rule "eats" the rest of the text. Define it like this: +~qi::char_("[")), so that it would stop at '[' character. The following code works as expected:

#include <boost/spirit/include/qi.hpp> using namespace boost::spirit::qi; int main() { std::string ip, user, timestamp, req, referer; unsigned status, transferred_bytes; std::string entry = "127.0.0.1 - [16/Aug/2012:01:50:02 +0000] \"GET /check.htm HTTP/1.1\" 200 17 \"AgentName/0.1 libwww-perl/5.833\""; bool r = phrase_parse(entry.begin(), entry.end(), lexeme[+char_("0-9.")] >> +~char_("[") >> lexeme[('[' >> +~char_("]") >> ']')] >> lexeme[('"' >> +~char_("\"") >> '"')] >> uint_ >> uint_ >> lexeme[('"' >> +~char_("\"") >> '"')], space, ip, user, timestamp, req, status, transferred_bytes, referer); } 
Sign up to request clarification or add additional context in comments.

6 Comments

Actually my goal is to be able to change the sequence of the different rules. This is an entry from an access log, and thus the sequence can change depending on the logformat directive of the webservers config. How is it possible to solve it with boost spirit?
@user777377 I'm not sure I got your question and how it's related to the original one - could you provide an example? If it's just another one, please write a separate question.
e.g.: std::string1 = '127.0.0.1 - [16/Aug/2012:01:50:02 +0000]'; std::string2 = '127.0.0.1 [16/Aug/2012:01:50:02 +0000] -';
@user777377 well, you should define how the ip is separated from the timestamp. For instance, the definition may be: "any characters, but "[", or nothing". In this case, just change +~char_("[") in my code above to *~char_("[").
Please ignore my previous comment. So e.g.: std::string s1 = '127.0.0.1 - [16/Aug/2012:01:50:02 +0000]'; std::string s2 = '127.0.0.1 [16/Aug/2012:01:50:02 +0000] -'; std::vector<rules> input1{ip, user, timestamp} configurable_smart_parser(input1, s1); std::vector<rules> input2{ip, timestamp, user} configurable_smart_parser(input2, user, s2); I want to build something similar to this configurable_smart_parser
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.