Let's workshop an examples question

Question

There has been a lot of pushback on questions seeking examples of certain things in different languages, but also a lot of people noting that they see value in surfacing different approaches taken in real-world usage that they haven't encountered. I'd like to try to assemble a consensus "good" question in that vein for this site, or establish that it's not achievable, and to do that I want to try to workshop it into shape.

I have a topic in mind that is

about semantics, rather than syntax
has a restricted scope of relevant languages, not just "what are some ..."
I know does have genuine contrasting examples.

This is drawing off one of my long-standing list entries for possible academic studies. All this is to say that I believe the topic of the question is worthy of study, non-trivial, and not endless, and I think it then avoids those particular objections to "examples of syntax for X" questions in favour of the actual nature of this type of question. However, I haven't posted it because, frankly, I can't figure out how to frame it as a question to be a good fit for the site.

I'm going to put the direct version below, and invite answers proposing how it could be shaped into a form that is suitable for the site. We might establish that there isn't a suitable version, which will be good to figure out too, or that actually everyone is fine with this version. I want to separate this discussion from thinking about concrete examples on the main site where people have had a more direct involvement already, so it's a fresh question nobody's had any engagement with before.

How have modern languages dealt with (Unicode) strings?

Languages developed over the last fifteen years or so have been well within the "ubiquitous Unicode" era, and been able to design their string types accordingly.

I'm looking for examples of how different real-world languages with their primary release from 2008-2018 represent textual strings, how access within strings (e.g. indexing or iteration) behaves at the language level, the functioning of string equality, any performance or semantic tradeoffs made within that, and when appropriate how these choices have been received by programmers over the time since.

Relevant aspects of Unicode itself might include Unicode transformation formats, normalisation, codepoints, code units, and grapheme clusters; language syntax is relevant only as far as it's supporting the semantics.

For context on answers here: at the very least Rust, Swift, Go, and Raku all meet these criteria, all have deliberately addressed Unicode in their native string type, and all have made vastly different choices than each other, within a similar timeframe. There are certainly real answers to this question that have substance. What I want to establish here is a consensus acceptance of the way to elicit useful answers with a question like this, or to establish that it just can't be done.

is the 2018 cutoff to allow for getting data on "how these choices have been received by programmers over the time since"? (sorry if "dumb" question). also, you say "primary release from 2008-2018", but what if the unicode part came later in the form of library functions? Ex. not really post-2008 release, but github.com/tc39/proposal-intl-segmenter/issues/114 — starball
– starball, Commented May 23, 2023 at 7:58
Yes, I picked five years for historical perspective and ten years back from that. Not wedded to the dates, just wanted a clearly fixed set of languages. I would probably not have counted the segmenter module as not having an impact on the language design, but open to suggestions that it should. — Michael Homer
– Michael Homer Mod, Commented May 23, 2023 at 8:12
so is this example question's scope about the "core language"- excluding things like standard library facilities? If so, I'm curious about the rationale. For a given feature, some languages may put it in the language, and some in their standard libraries, so the constraint seems a bit wasteful (perhaps even arbitrary?) to me- at least- from my limited thinking and without hearing your reasoning. In a sense, who am I to talk, but I think (zero evidence) evolution mechanisms are not really something that can be put at standard library level. — starball
– starball, Commented May 23, 2023 at 8:38
It's just not possible to affect the language design backwards in time, since that already happened well before the segmenter. It's not about being in library code or not; much of e.g. Go's string handling is library, by design. Perhaps it'd be an answer, just a probably uninteresting one since it's an ordinary library that had no impact on anything. — Michael Homer
– Michael Homer Mod, Commented May 23, 2023 at 8:46

Gilles 'SO- stop being evil' · Accepted Answer · 2023-05-23 22:02:41Z

This kind of question is generally too broad. I would close the version you posted as too broad. There are so many different considerations that you'd need to write a whole book chapter to cover the topic adequetly. For example:

Many languages are constrained by compatibility with pre-Unicode versions of the language. If that's not an issue for you, that alone removes a lot of baggage.
Do you need to handle strings in a non-Unicode encoding? E.g. what should happen when reading a file that is apparently text but turns out to contain invalid UTF-8?
Do you need to distinguish between different equivalent encodings (e.g. combining characters vs legacy characters)? Can you afford to depend on a huge (by certain standards) dependency on a Unicode canonicalization implementation in your standard string library?
etc.

A templating language for the web, a macro language for a text editor, a language for industrial automation, a language for symbolic mathematical computation are going to have vastly different requirements. Pick one.

To be clear, this is intending to cover the sorts of questions asking for examples of what has been done before, not "how can I do X". There are a lot of both on the site that are not so great, but here I'm focusing on the former, where design constraints aren't so obviously crucial and people have argued for finding value in exposure to unknown approaches. I expect your answer would be that seeking instances is an inherently unsuitable question, but is this suggesting that giving a limited application domain ("examples of how general-purpose/templating/... languages did X") would be suitable? — Michael Homer
– Michael Homer Mod, Commented May 23, 2023 at 22:30
@MichaelHomer Seeking instances is rarely a useful focus in a question: if there are existing languages that are close to what you're trying to do, any good answer would mention them. A limited application domain at least gives a chance that the examples would be representative. But really, the question should be: I want to do X, how do I do it? And that naturally invites experience from people who've done X or nearly-X — with the application domain constraining nearly-X to a manageable size. Asking how everybody has done X is asking for a very broad survey paper: that's too much. — Gilles 'SO- stop being evil'
– Gilles 'SO- stop being evil', Commented May 23, 2023 at 22:35

Michael Homer · Accepted Answer · 2024-06-24 03:05:11Z

This question has been posted a while ago and went... ok. We can refer to that experiment and consider how we'd want to address these sorts of thing in the future, but that'll be a new question rather than in here.

This answer is so that I can accept it and close the question, because otherwise it will keep getting automatically bumped by the system forever.

Stack Exchange Network

Let's workshop an examples question

How have modern languages dealt with (Unicode) strings?

2 Answers 2

You must log in to answer this question.

Linked

Hot Network Questions

Let's workshop an examples question

How have modern languages dealt with (Unicode) strings?

2 Answers 2

You must log in to answer this question.

Linked

Related

Hot Network Questions