1

I have a pretty standard recursive descent parser. The algorithm is straightforward: each function reads characters from a stream and either returns FAIL or calls subsequent parse functions (specified by a corresponding grammar rule):

function parseFoo() { var ret = parseBar(stream); if (ret == FAIL) { ret = parseQuax(stream); ... } ... return ret; } 

I would like to solve a situation where I don't have the full stream at once - I get its pieces asynchronously. Therefore what I need is a feature of "interruptability" - I must be able to save a state of the parser and later continue from that point on.

Sadly, this is one thing that is impossible to do with nested function calls, because all the state of a parser is "stored" on the call stack (in the order of functions and in their local variables).

So I have to construct the parse* functions differently.

The language I use is JavaScript.

Can someone point me o any ideas how to proceed?

EDITs:

  • It seems clear to me that I need some sort of state machine(s). I can't wrap my head around generators or continuations-passing-style, it looks to me that there are so many glitches in the saving of a state and resuming. To me most clear pathway I can think of is to convert the nested calls into some sort of stack, rewrite the local variables to some hashmap stored at the stack item and construct the parsing code in a different, linear fashion so I can easily "goto" to some state.

  • One of the sub-problems I see is might be this: Since I don't have the full stream I think I have to try multiple paths and store all the partially-parsed attempts. For example if the grammar says a = b | c then b might be so long that it is not fully in the stream. So I can't "hang" in parsing b, I have to try both and save the partial computations. Am I right?

4
  • 1
    Actually, this is straightforward in some languages. Which language are you using? Commented Apr 17, 2014 at 10:53
  • 1
    If you can rely on relatively recent language additions, you could try using generators to build a sort of coroutine. Commented Apr 17, 2014 at 10:57
  • Shouldn't if (ret == FAIL) actually be if (ret != FAIL)? Commented Apr 17, 2014 at 11:01
  • 1
    Depends on the grammar rule. It might be foo = bar quax or foo = bar | quax. Anyway, the code isn't some precise example at all. Commented Apr 17, 2014 at 11:03

1 Answer 1

1

It depends upon the programming language used.

You basically need some support of continuations (which you could understand as the abstraction of the call stack). In other words, you may want to code in continuation passing style (CPS).

If coding in C, you might be interested by CPC.

If coding in Javascript, you'll probably need to hand-code in CPS. See also this, this, this question, this pages etc....

Sign up to request clarification or add additional context in comments.

3 Comments

I may be biased against coroutines, but IMHO full on continuation passing style is overkill and too complex. One only needs to resume at a single point, not an arbitrary number as e.g. when backtracking. In other words, a generator, not even a coroutine, let alone a full continuation. For that, a simple-ish state machine should work just as well, with less code and indirection.
But CPS is easy to code in. Of course, you'll need to think of at at the start.
You see, that's the part I'm biased about. I'd have to think really hard about how to map all that control flow to CPS, whereas for a state machine, I'd write generator pseudo-yields and simply slap some mechanical transformations (explicit state argument, switch to find resume position, move locals into state argument) around that. Of course, someone who often writes CPS code and rarely state machines would be in the reverse position.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.