Ascii punctuation characters to basic latin lowercase alphabet mapping

Question

Languages like Haskell allow you to create your own operators. The following answer explains which punctuation characters are allowed in operators: https://stackoverflow.com/a/10548541/783743

Languages like JavaScript on the other hand do not allow you to use punctuation character (beside $) in your variable names. ^[1]

I am writing a compiler which compiles a subset of Haskell to JavaScript and I don't know how to convert the operators into valid JavaScript identifiers.

Hence I decided to map each punctuation character to a basic latin lowercase alphabet (i.e. a-z). For example:

& = a | = l @ = q

However instead of deciding the character mapping for myself, I first want to know whether anybody else has already done the same thing or whether there's a standard which decides how to map them.

I realize that this question could become primarily opinion based (which for some reason is strictly disallowed on StackOverflow). Hence I'm only looking for canonical answers which state definitively that "this is the way to do it" (perhaps with a link). If you want to opine then you can do so in the comments.

There are currently 19 characters which I wish to map to alphabets:

! # $ % & * + . / < = > ? @ \ ^ | - ~

Although $ is a valid character for identifiers in JavaScript it would be nice to map it to an alphabet too.

^[1] Property name can have special characters, but that's an ugly hack.

The question is: do you wish your js code to be human readable or not? — didierc
– didierc, Commented Jun 18, 2014 at 8:20
@didierc In my opinion True.aa(True) is more human readable than True["&&"](True). The latter case is more descriptive but in my opinion it looks ugly. — Aadit M Shah
– Aadit M Shah, Commented Jun 18, 2014 at 8:43
What I mean is: if you care about readability, of course you'll try to stick to common idioms (usage of methods rather than array selectors), but if you don't, then it might make your life simpler to use whichever way allowing a direct mapping from haskell identifiers to js ones. — didierc
– didierc, Commented Jun 18, 2014 at 8:51
@didierc Yes, I do want the generated code to be readable. I would like people to be able to understand the generated code and integrate it with their JavaScript applications. — Aadit M Shah
– Aadit M Shah, Commented Jun 18, 2014 at 9:26

Twan van Laarhoven · Accepted Answer · 2014-06-18 08:59:07Z

3

Ghc uses what they call z-encoding. For example, >>= is encoded as zgzgze. See https://ghc.haskell.org/trac/ghc/wiki/Commentary/Compiler/SymbolNames

answered Jun 18, 2014 at 8:59

Twan van Laarhoven

2,60218 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Aadit M Shah Over a year ago

I appreciate the fact that you found out what GHC officially does. Hence +1. Nevertheless expanding punctuation characters to two character codes doubles the size of operators. When readability and understandability counts, that is unacceptable.

Twan van Laarhoven Over a year ago

The reason for expanding to two characters is to be completely unambiguous. You wouldn't want a function gge to conflict with the >>= operator. If you know that names don't mix symbols and letters, then you can get away with only an operator marker at the start of the name, say op_gge.

Aadit M Shah Over a year ago

True. I was thinking along the lines of simply converting && to aa. However if there's already a function named aa then I would compile it to $aa. Since $ is not a valid character in varsyms in Haskell and $ is allowed in identifiers in JavaScript this would resolve all ambiguities, while also keeping the length of the symbol to a minimum.

didierc Over a year ago

But if the $aa symbol is already taken, you'll have to find another way. c simply prepends any symbol with an underscore, but the same problem arises, though the standard used to discourage that usage for anything other than system/compiler code. You don't really have that luxury.

Aadit M Shah Over a year ago

@didierc The $aa symbol can never be taken because Haskell doesn't allow the $ in varsyms. The compiled JavaScript code will be namespaced. Hence it wouldn't cause any naming conflicts there either.

Collectives™ on Stack Overflow

Ascii punctuation characters to basic latin lowercase alphabet mapping

1 Answer 1

5 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Linked

Related