(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)
I think this answer provides an elegant solution -- at least for the less critical readers.
Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.
This generates aan EBNF grammar for sequences of chemical elements:
Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.
The following code generates parsers and adds parser modifiers to concatenate the characters:
res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; The following parsing examples are with the generated pCHEMSPLIT:
words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters] The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.
With an alternative, greedier ParseMany implementation we can get all possible valid parsings:
pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters] (Initially I did not think that posting this answer would contribute to the discussion, but then it occurred to me that it is a continuation of another discussion with Mr.Wizard over programming styles.)


