Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

22
  • 10
    I think the most important thing here is to avoid the appearance of "random" behavior with the highlighting. When someone goes to the trouble to manually specify the correct language the last thing they should expect to see is entirely random highlighting. Commented Oct 27, 2020 at 12:33
  • 2
    that is also in part because people don't check which languages are supported, and hence assume that certain behaviour is supported whilst it isn't. Which then in the end leads to confusion. Commented Oct 27, 2020 at 12:36
  • 12
    SE's current behavior doesn't do anything to help with this confusion though. There is a reason our default behavior (as a library) is to fallback to NO highlighting when an incorrect or unknown language is provided. It provides feedback... makes it clear something is wrong, ie that the language "is not supported". Perhaps the user typed it wrong. We also log an error to the console, but SE could of course make a visual warning if they chose: groovy is not a supported language currently - [link to supported list]. They could even do this during composition. Commented Oct 27, 2020 at 12:41
  • 1
    If only people would actually read the warnings. Commented Oct 27, 2020 at 12:54
  • 12
    That's a big reason we default to "no highlight"... it's a warning that's hard to miss - doesn't require reading - yet provides immediate feedback to the user. :-) And if someone was a regular SO user they'd very quickly learn what no highlighting signaled: an unsupported language or a typo in language name. Inconsistent highlighting is a lot harder to "see" at a glance, making it far worse feedback. Perhaps even impossible without multiple examples. Commented Oct 27, 2020 at 13:01
  • I fully agree with the question, but I wonder what the automatic detection is then good for? It seems to be only good for cases where the content creator itself has no clue what the language is of the code, he/she just provided. And then we would again get potential random behavior. Automatic detection can always lead to random behavior, or not? As for Groovy, StackOverflow could just enable the Groovy grammar. Does the highlight.js auto-detection gets things more often wrong if the number of possible languages increases? Commented Oct 27, 2020 at 14:29
  • 6
    Auto-detect allows post authors to use less effort - you don't need to think about tagging every snippet. This can often work very well when paired with post tag context... i.e. a post tagged "javascript" is far more likely to be javascript than say sql. (Though SE still has tons of room to improve here also.) I'm not suggesting we remove ALL auto-detection, only auto-detect where it's known in advance the outcome is likely to be poor. (such as when a grammar is requested that SE has chosen not to load) Commented Oct 27, 2020 at 14:54
  • 2
    As for Groovy, StackOverflow could just enable the Groovy grammar. I'm pretty sure it's a space/size concern not a reliability concern. Every language makes their site slower to download... our library is almost 1mb if you include every language, yet ~50kb for a small popular set of languages. Does the highlight.js auto-detection gets things more often wrong if the number of possible languages increases? That's certainly a possibility (more to choose from) though SE could mitigate it greatly with smarter usage of tags to clue the auto-detection. Commented Oct 27, 2020 at 14:56
  • 11
    Automatic detection can always lead to random behavior... No... the hope is you get predictable behavior based on relevant content (when there is enough signal in the snippet) - not random behavior. But when you remove the "right" answer (purposely don't load a grammar, etc.) and then make the highlighter choose from a bunch of sub-optimal answers [none of them good matches]... you're far more likely to get randomness since you have no idea what the correct signal even is. Commented Oct 27, 2020 at 15:06
  • 1
    It would also be possible to lazy-load a more complete set (or just a differentset) of grammars based on either the specified language or the language specified as the default for the tag (currently, there's a very limited list of permitted tag default syntax highlighting languages, so that would need to change). Being able to load more languages, or even the entire language set, doesn't require that the entire language set be downloaded for every page. It's more complicated, but SE already has a mechanism in place in the JavaScript to lazy-load additional packages when needed/desired. Commented Oct 27, 2020 at 17:22
  • 2
    Absolutely. Lazy-loading is the answer if the goal is to correctly highlight as many languages as possible as correctly as possible. Though SE has mentioned the monetary bandwidth costs as well, not just the bundle size from a time perspective. Creative client-side caching would help a lot there I think because once you downloaded a grammar a single time there’s no need to ever download it again. At least not until a new version of library is used. Commented Oct 27, 2020 at 17:38
  • One way to approach this is by ending the practice of tags lacking a specified highlighting language triggering auto-detect... but that may do more harm then good, since there are plenty of tags where a language simply doesn't make sense, especially the ones denoting concepts rather than libraries or languages (eg. array). I think you nailed it in your comment- the best case scenario from a usage POV would be to lazy load it if the language specified/ detected wasn't already loaded. Commented Oct 27, 2020 at 22:06
  • 4
    @zcoop98 if the language specified/ detected wasn't already loaded This wording is a bit confusing. A language grammar must first be loaded before it can be auto-detected... so there is no such thing as "This looks like Groovy, so now lazy load Groovy". But if a post was hinted groovy then SE could choose to lazy load Groovy and use that explicitly. Or if a post was tagged groovy (among other things) then SE could lazy load groovy before highlighting and then auto-detect would consider groovy as a possibility when doing the analysis. Commented Oct 27, 2020 at 23:54
  • 2
    ending the practice of tags lacking a specified highlighting language triggering auto-detect As you say that might be too radical. Really a list of all valid language tags is necessary... so that when given a tag the JS can query "is groovy a language tag or a generic concept tag"? And if it's a language (one that's simply not in the default bundle) then that would either key the lazy-load - or simply turn off highlighting for that block. Commented Oct 28, 2020 at 0:01
  • 2
    Oh yeah! Wanted to plug this relevant user script by @LionelRowe that does implement lazy loading of highlight.js language libraries. It only works when the language is specifically specified by a lang-X identifier (rather than a tag), however, it does succeed in highlighting languages currently unsupported by SE. Commented Oct 28, 2020 at 14:56