Based on your question I'm assuming that you're writing your own recursive descent parser for an indentation-sensitive language.
I've experimented with indentation-based languages before, and I solved the problem by having a state that keeps track of the current indentation level and two different terminals that match indentation. Both of them match indentation units (say two spaces or a tab) and count them. Let's call the matched indentation level matched_indentation and the current indentation level expected_indentation.
For the first one, let's call it indent:
- if
matched_indentation < expected_indentation, this is a dedent, and the match is a failure. - if
matched_indentation == expected_indentation, the match is a success. The matcher consumes the indentation. - if
matched_indentation > expected_indentation, you have a syntax error (indentation out of nowhere) and should handle it as such (throw an exception or something).
For the second one, let's call it dedent:
if matched_indentation < expected_indentation, the match is successful. You reduce expected_indentation by one, but you don't consume the input. This is so that you can chain multiple dedent terminals to close multiple scopes.
if matched_indentation == expected_indentation, the match is successful, and this time you do consume the input (this is the last dedent terminal, all scopes are closed).
if matched_indentation > expected_indentation, the match simply fails, you don't have a dedent here.
Those terminals and non-terminals after which you expect an increase in indentation should increase expected_indentation by one.
Let's say that you want to implement a python-like if statement (I'll use EBNF-like notation), it would look something like this:
indented_statement : indent statement newline; if_statement : 'if' condition ':' newline indented_statement+ dedent ;
Now let's look at the following piece of code, and also assume that an if_statement is a part of your statement rule:
1|if cond1: <- expected_indentation = 0, matched_indentation = 0 2| if cond2: <- expected_indentation = 1, matched_indentation = 1 3| statement1 <- expected_indentation = 2, matched_indentation = 2 4| statement2 <- expected_indentation = 2, matched_indentation = 2 5| <- expected_indentation = 2, matched_indentation = 0
- On the first four lines you'll successfully match an
indent terminal - On the last line, you'll match two
dedent terminals, closing both the scopes, and resulting with expected_indentation = 0
One thing you should be careful of is where you put your indent and dedent terminals. In this case, we don't need one in the if_statement rule because it is a statement, and indented_statement already expects an indent.
Also mind how you treat newlines. One choice is to consume them as a sort of statement terminator, another is to have them precede the indentation, so choose whichever suits you best.