Converting Unicode Number Subscripts to Standard Mathematica Subscript Notation

Question

I am importing data

data={"t*α₁","α₁*α₄"}

But unfortunately, all of the subscripts are in Unicode:

FullForm[data] List["t*α\:2081","α\:2081*α\:2084"]

i.e. subscript 1 is encoded as \:2081 etc. Is there a way of converting these Unicode subscripts into Mathematica's more standard subscripts

desiredoutput={t Subscript[α, 1], Subscript[α, 1] Subscript[α, 4]}

Manipulating the data is difficult with the current Unicode formatting of the subscripts.

How many different variations do you have of those subscripts? If there aren't very many, then constructing replacement rules "by hand" could be the easiest way: ToExpression[data] /. \[Alpha]₁ -> Subscript[\[Alpha], 1] and similar others. — MarcoB
– MarcoB, Commented Jan 22, 2024 at 17:21

Domen · Accepted Answer · 2024-01-22 17:59:23Z

Here is a code that works for subscript numerals, even if they represent a multi-digit number. You can modify the code to include also other subscript characters.

subscriptNumerals = CharacterRange["₀", "₉"]; convertSubscripts[str_String] := ToExpression[str] /. (sym_Symbol /; StringContainsQ[SymbolName[sym], subscriptNumerals]) :> First@StringReplace[SymbolName[sym], (x : Except[subscriptNumerals]) ~~ (i : subscriptNumerals ..) :> Subscript[x, FromDigits[(First@*ToCharacterCode /@ Characters[i]) - 8320]]] convertSubscripts["α₂ + t β₁₄ z + 3γ₁₇₈"] (* Subscript["α", 2] + t z Subscript["β", 14] + 3 Subscript["γ", 178] *)

Works perfectly. Just needed to modify the first line from "₁" to "₀" for the argument of CharacterRange to handle subscript 10 etc. Thanks! — Luke
– Luke, Commented Jan 22, 2024 at 17:55

Stack Exchange Network

Converting Unicode Number Subscripts to Standard Mathematica Subscript Notation

1 Answer 1

Hot Network Questions

Converting Unicode Number Subscripts to Standard Mathematica Subscript Notation

1 Answer 1

Related

Hot Network Questions