Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc? [closed]

Question

For example, keywords have a special prefix. Objective-C has @interface, @implementation, but that's for compatibility with C. It inherits all the C keywords of course, with no @. How about a language in which all keywords are prefixed with a special character? It will only save the lexer some trivial trouble, but the programmer finally gets to use the words float, class, template ... etc.

Contrarily, C# allows @class as a variable name, and via reflection the name can be retrieved as "class" instead of "@class". It's good compromise, but it must add a little bit complexity to the lexer.

Oh, what I wouldn't give to be able to use 'float' as a name! Seriously though, the important thing for most languages is the ease with which humans read, write, talk, and reason about them. The ease by which the compiler parses is almost irrelevant, and a modest set of keywords do not normally cause confusion to humans. Whether something is a type or a variable or a keyword is something that can be left to automatic inference and code colouring. — Steve
– Steve, Commented Apr 23, 2023 at 10:36
@Steve True. The last point in amon's answer - new keyword may break old code - is a bigger problem. Still barely a problem. — Eugene
– Eugene, Commented Apr 23, 2023 at 11:14
I'd say it should be barely a problem. Language syntaxes are ideally designed as a system, and there shouldn't be significant incremental additions necessary or possible. Consider C++ for example, which has undergone years of incremental additions - it's completely incoherent. — Steve
– Steve, Commented Apr 23, 2023 at 12:59

amon · Accepted Answer · 2023-04-23 09:44:07Z

Distinguishing keywords/operators from user-defined names is not strictly necessary. Scannerless parsers can do just fine regardless. For example, it would be feasible to define a language where the following is a valid expression:

if if then then else else

which could unambiguously mean (in Perl syntax):

($if) ? $then : $else

However, this tends to not be good language design because it can make the code difficult to understand for humans. Similarly, Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo is a valid English sentence, but it is not an example of clear writing.

So there are three general strategies to disambiguate keywords/operators from user-defined names:

Contextual keywords, i.e. a name that would typically be a normal identifier, but plays the role of a keyword in some contexts. This is sometimes seen in languages where new keywords are retrofit in a backwards-compatible manner, but is not so often a design goal from the start.
Stropping or sigils. Some early languages like ALGOL-68 had a more flexible approach to syntax, with things like underlining or boldface being part of the canonical syntax. Stropping involves extra markup for keywords in case the canonical syntax cannot be expressed. For example, the following might be equivalent, depending on the compiler:

IF cond THEN x = 1 ELSE x = -1 FI
'IF' cond 'THEN' x = 1 'ELSE' x = -1 'FI'
.IF cond .THEN x = 1 .ELSE x = -1 .FI

Stropping is still seen in older Fortran code, for example for operators like .EQ..

Stropping is also featured in some SQL dialects, for example for quoting table and column names that would otherwise clash with SQL keywords (which otherwise behave as contextual keywords).

Perl is a more recent language that has to be discussed in this context. Perl has separate namespaces for operators/subroutines, scalar variables, arrays, and hash tables. These are distinguished by sigil, and to some degree by context. A sigil is a symbol in front of the name, for example a dollar sign for scalar variables (numbers, text). There are the following ways how an identifier if could be used, which is otherwise a keyword:
- scalar $if
- hash table %if
- array @if
- method $object->if()
- typeglob *if
- explicit function call &if()
- qualified function name ::if()
Some languages use sigils for other purposes. For example, Ruby does not require variables to have sigils, and uses them to denote scope ($global, @instance_field).

A similar concept is "raw identifiers" in some languages, which is mostly useful for ABI compatibility issues. For example, type is a reserved word in Rust. But an identifier for that name can still be written as r#type.
Reserved words. This tends to be the most popular approach. It has the advantage that unlike with sigils, the syntax looks cleaner, and unlike with contextual keywords, it becomes possible to have an independent tokenizer. However, introducing new keywords will be a backwards-incompatible change.

Stack Exchange Network

Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc? [closed]

1 Answer 1

Hot Network Questions

Is it a good idea to let keywords have different lexical rules from names of types, variables, functions, etc? [closed]

1 Answer 1

Related

Hot Network Questions