0

I'm parsing text using Java. I've defined a grammar below:

Start := "(\\<)" Stop := "(\\>)" Var = "(\\w*)"; Cons = "([0-9]*)"; Type1 := Start ((Var | Cons) | TypeParent) (Type1 ((Var | Cons) | TypeParent))* Stop Type2 := Start ((Var | Cons) | TypeParent) (Type2 ((Var | Cons) | TypeParent))* Stop TypeParent := Type1 | Type2 ... etc 

I want to combine all the regexes into a single String pattern and match all at once. My problem is when I start using the recursive grammar elements in the Type1 and Type2 lines. I obviously can't feed a recursive definition into a Pattern in Java - it's just a String with the regex symbols.

What I'd like is that I could somehow have a logical switch that said that if in this block:

(Type2 ((Var | Cons) | TypeParent) 

all patterns were matched except Type2, that I could capture all the other groups, but then extract the string of characters where Type2 token should be and then recursively feed it into the regexer again. Eventually I'd get down to a base case of:

(Var | Cons) | TypeParent) 

I realize that this isn't what regex is meant to do - this is now a context free grammar (?) since it is recursive. But short of thinking of a super clever parser, I think this method is hackable.

Thoughts?

2 Answers 2

5

You are correct. This isn't what regular expressions are meant to do. The instant you introduce recursion you are out of the land of regular expressions, DFAs, and into the land of context-free languages, DPDAs, parsers. You need a stack to handle recursion. And no, it isn't hackable.

There's nothing 'super-clever' about the parser for this grammar. It is considerably simpler than what you've done so far.

Sign up to request clarification or add additional context in comments.

2 Comments

@lollercoaster A grammar with three productions isn't complex. You don't need a state machine at all. You can do it in recursive descent. I'd advise you to stop guessing and start learning what these techniques really mean.
you are correct. recursive descent was pretty easy. implementation with the shunting yard algorithm -> RPN -> AST tree did exactly the trick. thanks!
3

It's so much easier to use the right tool for the job. Try CUP, ANTLR, or JavaCC. Here is an ANTLR example.

1 Comment

JParsec looks cool. Pure Java, no grammar template required. Thanks!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.