Skip to main content
added 50 characters in body; added 4 characters in body; deleted 8 characters in body
Source Link
Anton Antonov
  • 38.5k
  • 3
  • 104
  • 184

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates aan EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[  ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With an alternative, greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters] 

enter image description here

(Initially I did not think that posting this answer would contribute to the discussion, but then it occurred to me that it is a continuation of another discussion with Mr.Wizard over programming styles.)

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[  ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters] 

enter image description here

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates an EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With an alternative, greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters] 

enter image description here

(Initially I did not think that posting this answer would contribute to the discussion, but then it occurred to me that it is a continuation of another discussion with Mr.Wizard over programming styles.)

deleted 66 characters in body
Source Link
Anton Antonov
  • 38.5k
  • 3
  • 104
  • 184

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post itthis answer, but then it occurred to me that this answerit is asa continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]Characters] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"]Characters] 

enter image description here

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post it but then it occurred to me that this answer is as continuation of discussion with Mr.Wizard over programming styles.)

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"] 

enter image description here

(I was not going to post this answer, but then it occurred to me that this it is a continuation of a discussion with Mr.Wizard over programming styles.)

I think this answer provides an elegant solution -- at least for the less critical readers.

Note that the grammar generation code is the most complicated part. The rest is fairly direct and straightforward.

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters] 

enter image description here

Source Link
Anton Antonov
  • 38.5k
  • 3
  • 104
  • 184

I think this answer provides an elegant solution -- at least for the less critical readers.

(I was not going to post it but then it occurred to me that this answer is as continuation of discussion with Mr.Wizard over programming styles.)

This generates a EBNF grammar for sequences of chemical elements:

Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/FunctionalParsers.m"] elemNames = ToLowerCase[ Sort[ElementData[#, "Abbreviation"] & /@ ElementData["*"]]]; ebnfChemElem = "<chem-elem> = " <> StringReplace[ ToString[ If[StringLength[#] == 1, "'" <> # <> "'", StringJoin @@ Riffle["'" <> # <> "'" & /@ Characters[#], ","]] & /@ elemNames], {", " -> " | ", "{" -> "", "}" -> ""}] <> ";"; ebnfChemSplit = "<chem-split> = { <chem-elem> } ;"; ebnfChemElem <> "\n" <> ebnfChemSplit 

enter image description here

The rule for the <chem-split> specifies that <chem-split> is a list of <chem-elem> strings.

The following code generates parsers and adds parser modifiers to concatenate the characters:

res = GenerateParsersFromEBNF[ ParseToEBNFTokens[ebnfChemElem <> ebnfChemSplit]]; LeafCount[res] (* 4646 *) SetParserModifier[pCHEMELEM, StringJoin[Flatten[{#}]] &]; SetParserModifier[pCHEMSPLIT, Map[StringJoin, #] &]; 

The following parsing examples are with the generated pCHEMSPLIT:

words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pCHEMSPLIT, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"] 

enter image description here

The above does not parse all possible cases because ParseMany is implemented to pick shortest parsing paths. The parser generator uses ParseMany for { ... } parts in the EBNF rules.

With alternative greedier ParseMany implementation we can get all possible valid parsings:

pChemElem = ParseModify[DeleteDuplicates, ParseApply[Flatten,ParseManyByBranching[pCHEMELEM]]]; words = {"titanic", "silicon", "archbishop", "wombat", "mathematica"}; ParsingTestTable[pChemElem, words, "TokenizerFunction" -> Characters, "Layout" -> "Horizontal"] 

enter image description here