1,096 questions
1 vote
1 answer
51 views
How to parse optional separator that can be part of formatted text?
I'm currently trying to parse a custom configuration format using ANTLR4. Here is what the input may look like (in reality it's a lot more technical, but I had to change it for SO bc I want to keep my ...
1 vote
0 answers
75 views
How does the lexer handle tokens that are concatenated together?
Let me give you an example. In a language like Java, we have code like this: class className { } In this example, if className is concatenated to class, and also className is concatenated to {, how ...
1 vote
1 answer
85 views
is it safe to use a mode-switch rule that matches nothing?
I’m working on a custom Javadoc lexer in ANTLR 4, and I’m trying to handle special mode switching when I reach the start of a line. I have a simplified example like this: mode START_OF_LINE; ...
2 votes
1 answer
96 views
Capturing function parameters and optional type hints using regex
I'm trying to capture the optional type hints that come after parameter names for my own toy language so I can make them a different color, I'm using a tmLanguage.json file to create the synatx ...
0 votes
1 answer
58 views
Is there a programatic way to determine which ANTLR lexer a given ANTLR parser uses?
Let's assume I have a given ANTLR parser assembly with many Parser and Lexer classes contained within. Is there a programmatic way to determine which of the lexer classes a given parser class ...
0 votes
1 answer
75 views
Can't correctly split the input string into tokens if they are merged into each other
when there is no separation between fractions, and the lexer merges it all into one token → unknown to the FRACTION token: 2|1+4|15|2-18|5 line 1:11 no viable alternative at input '2|1+4|15|2-18|5' 2|...
0 votes
1 answer
51 views
Designing a DFA for a Lexer: Shared vs. Separate Character Nodes
When building the DFA for the lexer of my programming language, should each character (e.g., n, i, f) appear as a single shared node across all token paths, or should I allow duplicate nodes for the ...
0 votes
1 answer
50 views
ANTLR4 grammar for signle quoted strings with embedded single quotes
I am parsing an old language (PL/I) that uses single quotes for strings, and allows embedded single quotes as "two single" quotes for instance: 'This is a normal string' 'This contains '' ...
-1 votes
1 answer
92 views
Syntax error at ',' in PLY Pascal-like interpreter
I'm working on a simple Pascal-like interpreter using PLY, but I'm encountering a Syntax error at ',' during parsing. The issue arises when trying to parse a source file that includes commas. Below is ...
1 vote
1 answer
86 views
Why do lexers usually define a var as not being able to start with a number?
What's the difference between the token _123jh and 123jh that makes most lexers not include a number-starting identifier? I suppose one reason might be that a number-only token might be confusing, and ...
-1 votes
1 answer
164 views
Should an escaped unicode hex value '\u0000' in a JSON string be validated by the lexer or the parser?
Preamble I am creating my own JSON lexer and eventually a full parser, purely as a learning experience because that is what I enjoy doing. As I understand it, the lexer's job is to tokenize the data (...
0 votes
1 answer
26 views
Antlr4 lexer seems to have a problem processing token 'AX', and no semantic predicate runs on rule REG
In the following example, the input token 'AX' seems to cause errors for an unknown reason. The parse tree shows that other rule matches that contain register tokens such as 'DX' are working fine. I'...
1 vote
1 answer
52 views
Granularity of tokens for lexer
I want to build a little lexer and parser by myself. I want the lexer to produce a vector of tokens that I feed into the parser later. Now I think about what belongs into which stage. Let's look at ...
0 votes
0 answers
57 views
Regex Derivative DFA: This Should Work, But It's Breaking in Unexpected Ways
I’m working on a DFA-based lexer using regex derivatives for tokenizing lexemes. I've built a setup that, theoretically, should handle regex simplification and DFA transitions accurately. For the most ...
1 vote
0 answers
40 views
C++ Code doesn't give any output and stuck
I am making a lexer & parser for a 8 bit cpu, my lexer is working fine but when I added AST class for parse, this problem started. Whats the problem and how to solve it The code takes a string ...