Return to Answer

added 50 characters in body; added 4 characters in body; deleted 8 characters in body

edited Jun 3, 2017 at 10:55

38.5k
3
104
184

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates aan EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[  ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With an alternative, greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters]

(Initially I did not think that posting this answer would contribute to the discussion, but then it occurred to me that it is a continuation of another discussion with Mr.Wizard over programming styles.)

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[  ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters]

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates an EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With an alternative, greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters]

deleted 66 characters in body

Source Link

edited Jun 2, 2017 at 23:49

Anton Antonov

38.5k
3
104
184

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post itthis answer, but then it occurred to me that this answerit is asa continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]Characters]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]Characters]

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post it but then it occurred to me that this answer is as continuation of discussion with Mr.Wizard over programming styles.)

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters]

Source Link

answered Jun 2, 2017 at 23:43

Anton Antonov

38.5k
3
104
184

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post it but then it occurred to me that this answer is as continuation of discussion with Mr.Wizard over programming styles.)

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &];

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]