So i have a lexer with a token defined so that on a boolean property it is enabled/disabled
I create an input stream and parse a text. My token is called PHRASE_TEXT and should match anything falling within this pattern '"' ('\\' ~[] |~('\"'|'\\')) '"' {phraseEnabled}?
I tokenize "foo bar" and as expected I get a single token. After setting the property to false on the lexer and calling setInputStream on it with the same text I get "foo , bar" so 2 tokens instead of one. This is also expected behavior.
The problem comes when setting the property to true again. I would expect the same text to tokenize to the whole 1 token "foo bar" but instead is tokenized to the 2 tokens from before. Is this a bug on my part? What am I doing wrong here? I tried using new instances of the tokenizer and reusing the same instance but it doesn't seem to work either way. Thanks in advance.
Edit : Part of my grammar follows below
grammar LuceneQueryParser; @header{package com.amazon.platformsearch.solr.queryparser.psclassicqueryparser;} @lexer::members { public boolean phrases = true; } @parser::members { public boolean phraseQueries = true; } mainQ : LPAREN query RPAREN | query ; query : not ((AND|OR)? not)* ; andClause : AND ; orClause : OR ; not : NOT? modifier? clause; clause : qualified | unqualified ; unqualified : LBRACK range_in LBRACK | LCURL range_out RCURL | truncated | {phraseQueries}? quoted | LPAREN query RPAREN | normal ; truncated : TERM_TEXT_TRUNCATED; range_in : (TERM_TEXT|STAR) TO (TERM_TEXT|STAR); range_out : (TERM_TEXT|STAR) TO (TERM_TEXT|STAR); qualified : TERM_TEXT COLON unqualified ; normal : TERM_TEXT; quoted : PHRASE_TEXT; modifier : PLUS | MINUS ; PHRASE_TEXT : '"' (ESCAPE|~('\"'|'\\'))+ '"' {phrases}?; TERM_TEXT : (TERM_CHAR|ESCAPE)+; TERM_CHAR : ~(' ' | '\t' | '\n' | '\r' | '\u3000' | '\\' | '\'' | '(' | ')' | '[' | ']' | '{' | '}' | '+' | '-' | '!' | ':' | '~' | '^' | '*' | '|' | '&' | '?' ); ESCAPE : '\\' ~[]; The problem seems to be that after i set the phrases to false, and then to true again, no more tokens seem to be recognized as PHRASE_TEXT. I know that as a guideline i should define my grammars to be unambiguous but this is basically the way it has to end up looking : tokenizing a string with quotes in 2 different modes, depending on the situation.