Unable to parse SQL type where condition using boost::spirit::qi

Question

I may be asking a very trivial question but am not getting blocks out of my brain to crack it. Trying to parse a SQL like where clause as shown below using boost::spirit::qi to generate a vector of pairs

std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'"

I have gone through the following threads but still unable to do it :-( Thread5 Thread4 Thread3 Thread2 Thread1

[Thread1][6] [Thread2][7] [Thread3][8] [Thread4][9] [Thread5][10]

I genuinely request, kindly help me understand how to achieve this ... may be I had not completely given my 100% but please be kind ....

Here is the full code (some part commented which I wish to do), as a first step I was just checking if I can get all tokens in a Vector and then parse each Vector element to generate another vector of std::pair

#include <boost/fusion/adapted.hpp> #include <boost/spirit/include/qi.hpp> #include <map> #include <vector> namespace qi = boost::spirit::qi; namespace phx = boost::phoenix; typedef std::string str_t; typedef std::pair<str_t, str_t> pair_t; typedef std::vector<pair_t> pairs_t; typedef std::vector<str_t> strings_t; //typedef std::map<std::string, std::string> map_t; //typedef std::vector<map_t> maps_t; template <typename It, typename Skipper = qi::space_type> //struct parser : qi::grammar<It, pairs_t(), Skipper> struct parser : qi::grammar<It, strings_t(), Skipper> { parser() : parser::base_type(start) { using namespace qi; cond = lexeme [ *(char_) ]; conds = *(char_) >> cond % (lit("and")); //conds = *(char_ - lit("and")) >>(cond % lit("and")); /*cond = lexeme [ *(char_ - lit("and")) ]; cond = key >> "=" >> value; key = *(char_ - "="); value = ('\'' >> *(~char_('\'')) >> '\''); kv_pair = key >> value;*/ start = conds; //cond = key >> "=" >> value; //key = *(char_ - "="); //value = ('\'' >> *(~char_('\'')) >> '\''); // kv_pair = key >> value; // start = kv_pair; } private: qi::rule<It, str_t(), Skipper> cond; qi::rule<It, strings_t(), Skipper> conds; //qi::rule<It, std::string(), Skipper> key, value;//, cond; //qi::rule<It, pair_t(), Skipper> kv_pair; //qi::rule<It, pairs_t(), Skipper> start; qi::rule<It, strings_t(), Skipper> start; }; template <typename C, typename Skipper> bool doParse(const C& input, const Skipper& skipper) { auto f(std::begin(input)), l(std::end(input)); parser<decltype(f), Skipper> p; strings_t data; try { bool ok = qi::phrase_parse(f,l,p,skipper,data); if (ok) { std::cout << "parse success\n"; std::cout << "No Of Key-Value Pairs= "<<data.size()<<"\n"; } else std::cerr << "parse failed: '" << std::string(f,l) << "'\n"; return ok; } catch(const qi::expectation_failure<decltype(f)>& e) { std::string frag(e.first, e.last); std::cerr << e.what() << "'" << frag << "'\n"; } return false; } int main() { std::cout<<"Pair Test \n"; const std::string input = "book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'"; bool ok = doParse(input, qi::space); std::cout<< input <<"\n"; return ok? 0 : 255; }

OUTPUT:

Pair Test parse success No Of Key-Value Pairs= 2 book.author_id = '1234' and book.isbn = 'xy99' and book.type = 'abc' and book.lang = 'Eng'

Which I expect 4 ... since there are 4 conditions !!

Thanks in Advance Regards, Vivek

some example to work out- live on coliru

Perhaps also of interest: Parsing SQL Queries in C++ using Boost.Spirit and Attempting to parse SQL like statement with Boost-Spirit. — sehe
– sehe, Commented May 7, 2014 at 20:10

sehe · Accepted Answer · 2014-05-08 08:39:30Z

I'm sorry to break it to you, but your grammar is far more broken than you imagined.

 conds = *(char_) // ...

Right here, you're basically just parsing all the input into a single string, with whitespace skipped. In fact, adding

 for (auto& el : data) std::cout << "'" << el << "'\n";

after parsing prints:

Pair Test parse success No Of Key-Value Pairs= 2 'book.author_id='1234'andbook.isbn='xy99'andbook.type='abc'andbook.lang='Eng'' ''

As you can see, the first element is the string that *char_ parsed, and you get an empty element for free due to the fact that both conds and cond match on empty input.

I would strongly suggest you to start simple. And I mean, much simpler.

Slowly build your grammar up from the ground. Spirit is a very good tool to tackle with test-driven development (except for the compile times, but hey, you get more time to think!).

Here's something that I just made up, starting thinking from the very first building block, the indentifier, and working my way up to the higher-level elements:

// lexemes (no skipper) ident = +char_("a-zA-Z."); op = no_case [ lit("=") | "<>" | "LIKE" | "IS" ]; nulllit = no_case [ "NULL" ]; and_ = no_case [ "AND" ]; stringlit = "'" >> *~char_("'") >> "'"; // other productions field = ident; value = stringlit | nulllit; condition = field >> op >> value; conjunction = condition % and_; start = conjunction;

These are close to the simplest thing that I suppose could parse your grammar (with a few creative notes left and right, where they don't seem too intrusive).

UPDATE So this is where I got in 20 minutes:

I always start out with mapping the types that I want the rules to expose:

namespace ast { enum op { op_equal, op_inequal, op_like, op_is }; struct null { }; typedef boost::variant<null, std::string> value; struct condition { std::string _field; op _op; value _value; }; typedef std::vector<condition> conditions; }

Only condition cannot be "naturally" used in a Spirit grammar without adaptation:

BOOST_FUSION_ADAPT_STRUCT(ast::condition, (std::string,_field)(ast::op,_op)(ast::value,_value))

Now comes the grammar itself:

 // lexemes (no skipper) ident = +char_("a-zA-Z._"); op_token.add ("=", ast::op_equal) ("<>", ast::op_inequal) ("like", ast::op_like) ("is", ast::op_is); op = no_case [ op_token ]; nulllit = no_case [ "NULL" >> attr(ast::null()) ]; and_ = no_case [ "AND" ]; stringlit = "'" >> *~char_("'") >> "'"; //// other productions field = ident; value = stringlit | nulllit; condition = field >> op >> value; whereclause = condition % and_; start = whereclause;

You can see minor deviations from my original sketch, that's interesting:

added _ to identifier chars
move op_token into a symbol matcher (because that's easier to map the enum values)

See it all Live And Working On Coliru, output:

Pair Test parse success No Of Key-Value Pairs= 4 ( [book.author_id] = 1234 ) ( [book.isbn] LIKE xy99 ) ( [book.type] = abc ) ( [book.lang] IS NULL ) book.author_id = '1234' and book.isbn liKE 'xy99' and book.type = 'abc' and book.lang IS null

Thanks Much Sehe for exploring the details ... I guess I had put a very very naive effort and your post is an eye opener .. will think more attentively and work it out to build a cleaner, simpler grammer ... Regards Vivek
pardon for disturbing again .. but the implementation cannot parse LIKE or IS if provided instead of '=' !! whereas if I break tokens as "and_" then use "|" in "field >> (op_like | op_is) >> value" then it works .. is there any glitch which is hidden .. Regards Vivek
Posted a request for comments at the mailing list (minimal reproducer included) and updated the answer.
Oops. It was documented! We need to put the symbols entries in lowercase for use in no_case. Glad that's resolved :)
thanks again, its all working and I really appreciate your diligent efforts

Collectives™ on Stack Overflow

Unable to parse SQL type where condition using boost::spirit::qi

1 Answer 1

18 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

18 Comments

Linked

Related