6
$\begingroup$

I am importing data

data={"t*α₁","α₁*α₄"} 

But unfortunately, all of the subscripts are in Unicode:

FullForm[data] List["t*α\:2081","α\:2081*α\:2084"] 

i.e. subscript 1 is encoded as \:2081 etc. Is there a way of converting these Unicode subscripts into Mathematica's more standard subscripts

desiredoutput={t Subscript[α, 1], Subscript[α, 1] Subscript[α, 4]} 

Manipulating the data is difficult with the current Unicode formatting of the subscripts.

$\endgroup$
1
  • 2
    $\begingroup$ How many different variations do you have of those subscripts? If there aren't very many, then constructing replacement rules "by hand" could be the easiest way: ToExpression[data] /. \[Alpha]₁ -> Subscript[\[Alpha], 1] and similar others. $\endgroup$ Commented Jan 22, 2024 at 17:21

1 Answer 1

5
$\begingroup$

Here is a code that works for subscript numerals, even if they represent a multi-digit number. You can modify the code to include also other subscript characters.

subscriptNumerals = CharacterRange["₀", "₉"]; convertSubscripts[str_String] := ToExpression[str] /. (sym_Symbol /; StringContainsQ[SymbolName[sym], subscriptNumerals]) :> First@StringReplace[SymbolName[sym], (x : Except[subscriptNumerals]) ~~ (i : subscriptNumerals ..) :> Subscript[x, FromDigits[(First@*ToCharacterCode /@ Characters[i]) - 8320]]] convertSubscripts["α₂ + t β₁₄ z + 3γ₁₇₈"] (* Subscript["α", 2] + t z Subscript["β", 14] + 3 Subscript["γ", 178] *) 
$\endgroup$
2
  • $\begingroup$ Works perfectly. Just needed to modify the first line from "₁" to "₀" for the argument of CharacterRange to handle subscript 10 etc. Thanks! $\endgroup$ Commented Jan 22, 2024 at 17:55
  • $\begingroup$ Oh, right, of course :) Forgot about the zero ... $\endgroup$ Commented Jan 22, 2024 at 17:59

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.