25

Per kristinalustig's request, I'm posting a standalone MSE report for all the different languages being guessed incorrectly via the new code block labeling feature that rolled out as part of the copy-with-attribution feature this week:

Could you please create a separate meta post bug report where these misidentified language issues can be collected so that we can triage separately? It's probably a larger issue than we can tackle immediately and I don't want it to get lost in the shuffle.

Here is a list (with an example of incorrect) of all the recorded different languages that are incorrect:

Please feel free to edit this table to include examples of languages not yet listed here (only one entry per language, please)

21
  • Note--I'm told ColdFusion uses 'default' so it will guess wildly for every code block in a tag by that title. It may not be a good example to use for this report, but I'm not a ColdFusion expert so I'll defer to people who know that language. Commented Nov 6 at 17:32
  • 6
    I don't think building a table is worthwhile. It's a long-standing issue with the syntax highlighter when it is forced to guess the code. It's (mostly) not down to the language but the tags used. If more than one tag defines a syntax hint, then any code block without an explicit hint for it, will fall back to the guess mode. And when guessing, it doesn't matter what the language in the code block is. JS code can be detected as Java, or as Lisp, or as other stuff, depending on what the code present is. Commented Nov 6 at 17:34
  • And yes - for tags where the type hint is "default" the same applies. ColdFusion and .NET seem to do that. Commented Nov 6 at 17:34
  • The powershell tag defines the syntax hint as lang-bash. But I don't think there is specific PowerShell highlighting. Using lang-powershell seems to just resolve as lang-none. Commented Nov 6 at 17:37
  • 1
    @VLAZ I admit the table will get exhaustively long eventually but I'm hopeful given the "in-your-face" nature of the bug now that this new copy feature exists that the table existing and getting bigger and bigger will either push staff to remove the labels or actually fix the underlying problem. Commented Nov 6 at 17:39
  • 1
    There are numerous Python-related questions that don't have a Python tag, neither generic nor version-specific. Eg, Pandas 23,494, Django 142,155, Numpy 7,571. Though with Django, that's kind of understandable: meta.stackoverflow.com/q/320277 Commented Nov 6 at 18:06
  • 1
    Quite an abundance of SCSS and others here: stackoverflow.com/questions/26648227/… (I think it is supposed to be Java, however, the tags/question does not specify) Commented Nov 7 at 21:10
  • Question about Language tagged in the first column: is this the 'dominant' tag (in case of more than one language, for example javascript+html)? Commented Nov 9 at 11:22
  • The issue is wider than incorrect recognition of tags from the question. The syntax highlighter may simply not have the "correct" language. It will use the wrong one even when you tag your blocks explicitly, which I always do. Commented Nov 9 at 15:27
  • @GSerg To clarify, it will only use the wrong language when you tag your code blocks with a language highlight.js doesn't have a language highlighting rule set for. If you explicitly tag your blocks with, e.g. JS, then they will correctly display as JS even if the question has a c++ tag only. Commented Nov 10 at 14:09
  • @Wolf Yes, the language tagged column is for what language tag(s) the question has applied to it. Commented Nov 10 at 14:10
  • @TylerH in that case my addition should probably be JavaScript, HTML? Commented Nov 10 at 15:46
  • @Wolf In this case I think just JavaScript; while it looks like HTML, it's technically not; it's React (a JavaScript framework) code that happens to contain HTML as a sort of literal string. The code in that question would not be legal HTML if you tried to use an HTML parser or syntax validator, for example. Commented Nov 11 at 17:04
  • 1
    Reproducing this comment: "@kristinalustig when will the fix be fixed? It's literally wrong ~80% of the time in my >9k answers. I cannot afford to go back and fix. See for e.g. this mis-identifies c++ as ini, cpp, scss, rust, rust, less, php, php, ruby, cpp, cpp, scss, less, dart,lisp, cpp, rust, php .... shell, haskell. That's all in a single answer tagged c++" ¯\_(ツ)_/¯ Commented Nov 12 at 15:20
  • 1
    @Wolf Yes, it would be important to handle that, but I think that's a broader discussion, and a secondary one; not only would it be better served by something like a 'parent language/technology' tag (aka tag system redesign) but also I think just getting it right for a single tag first is important, as most questions aren't about multiple languages... once they have it working for Qs with one language tag, then they can look at how to handle multiple. Commented Nov 12 at 15:23

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.