I am looking into splitting words into a succession of chemical elements symbols, where possible. For example:
Titanic = Ti Ta Ni C (titanium, tantalum, nickel, carbon)
A word may or may not be decomposable under those rules, and if it is the decomposition might not be unique. I did two things: the first is a function checking if a decomposition is possible. I relied on the trivial regular expression to do so:
elements = ToLowerCase /@ Select[Table[ElementData[i, "Symbol"], {i, Length@ElementData[]}], StringLength[#] < 3 &] regexp = RegularExpression["(" <> StringJoin@Riffle[elements, "|"] <> ")+"]; decomposable[s_] := StringMatchQ[ToLowerCase@s, regexp]; decomposable /@ {"Mathematica", "archbishop"} which gives: {False, True}.
Slightly harder was to implement a function returning possible decompositions. I recently learnt of the existence of Sow and Reap via this very website, so I implemented the most naïve, greedy algorithm with a recursive function:
beginsWith[s_, sub_] := (StringTake[s, Min[StringLength[s], StringLength[sub]]] == sub); decompose0[s_, pre_] := Module[{list, remains}, If[StringLength[s] == 0, Sow[pre]]; list = Select[elements, beginsWith[s, #] &]; remains = StringDrop[s, StringLength[#]] & /@ list; If[Length[list] >= 1, decompose0[remains[[1]], pre <> " " <> list[[1]]]]; If[Length[list] >= 2, decompose0[remains[[2]], pre <> " " <> list[[2]]]]; ]; decompose[s_] := Reap[decompose0[ToLowerCase@s, ""]][[2, 1]]; This works nicely:
In:= decompose["archbishop"] Out= {" ar c h b i s h o p", " ar c h b i s ho p", " ar c h bi s h o p", " ar c h bi s ho p"} In:= decompose["titanic"] Out= {" ti ta n i c", " ti ta ni c"} So, the question is: in which way could I use Mathematica’s higher-level functions, e.g. the pattern-matching ones, to improve the algorithm or the code simplicity? I'm not into code-golfing, so it's not about making the code shorter, but about using a better-optimized algorithm or writing higher-level code. (The above I could pretty much have written in C, C++ or Fortran, my usual languages.)


