2

I'm trying to split up an equation string into tokens. Ive found a good starting point '([A-Za-z]+|[0-9.]+|[&=><\|!]+|\S)'. However this has trouble with negative numbers:

turns: '5--4=sin(2+3)' into: ['5','-','-','4','=','sin','(','2','+','3',')'] want: ['5','-','-4','=','sin','(','2','+','3',')'] 

and also

turns: -3+3 into: ['-','3','+','3'] want: ['-3','+','3'] 

It looks like a my regex could use something that checks if there is a number to the left of the '-' if not keep it with the next number(note '-3' has nothing to the left). Can it be done using regex? Or is there a better tool to split this up in .NET?

2 Answers 2

2

You are not approaching the problem correctly. The result you actually got is the correct one.

-3+3 should parse to:

operator binary + | +-- operator unary - | | | +-- 3 | +-- 3 

It will be much easier to reason about math expressions this way, you'll avoid many ambiguities. Let just - always be a token on its own, and use it either as a binary minus, or an unary negation operator.

See here for a related answer of mine which approaches the problem this way (it uses ANTLR but the lexing pass does exactly what I'm advising you to do).

Sign up to request clarification or add additional context in comments.

Comments

1

Regex is not powerful enough to do what you want in all contexts. Although you can make regex recognize + or - as part of an integer literal, for example, by adding an optional [+-]? in front of a digit sequence, the resultant regex would opt to tokenize '-3+3' as ['-3', '+3'] (demo).

Using a lexer generator should fix this problem; alternatively, you can deal with "bundling" unary operators with their operands in the parser.

7 Comments

Oh come on, regex is perfectly suited for lexing - it's a Chomsky type 3 problem. OP doesn't realize that the result he got is actually exactly what he needs. in - 3, the - is actually the unary negation operator.
@LucasTrzesniewski Of course, regex is perfectly suited for lexing, but OP wants his lexing to be context-sensitive. He wants two minuses in -3-3 treated differently, which is neither what he wants nor what regex can deliver.
Yes, after reading your answer for a second time now I get what you meant. BTW a lexer generator won't just magically fix the "problem" either, most just use regex under the hood ;)
@LucasTrzesniewski I had ANTLR 3 in mind: I remember it producing some nice "magic" for me in terms of lexing complex expressions. I think they do it by adding an extra layer on "smartness" top of regular expressions, though.
Hmmm... there's no such thing in ANTLR as far as I can tell. It lets you inject code into lexer expressions though by using inline blocks {...} or predicates {...}? (I don't remember if the v3 lexer used {...}? or {...}?=> though), so you can achieve custom magic that way.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.