How to parse an html document into an AST that includes line numbers for each node?

Question

I'd like to use JavaScript to parse an html document into an abstract syntax tree, where each node also includes start and end line numbers (and hopefully also character positions) for each node. Are there any existing solutions that can do this? I don't want to have to write it myself.

Edit Apr 24, 2016: Being able to parse HTML along with php tags in arbitrary places would be even more ideal.

Nope. For now I'm just using some regexes that handle most cases until I have time to return and write a real parser myself. — EricP
– EricP, Commented Mar 16, 2015 at 21:05
oh ok. I found this: github.com/HenrikJoreteg/html-parse-stringify, its missing line numbers though so I might try and add that if I have time — Ivan Bacher
– Ivan Bacher, Commented Mar 18, 2015 at 10:15

Michael buller · Accepted Answer · 2018-02-09 14:41:34Z

6

https://unifiedjs.github.io/ can get you the CST or AST for a few formats including HTML.

answered Feb 9, 2018 at 14:41

Michael buller

6865 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Amor · Accepted Answer · 2023-02-22 22:35:20Z

I used node-html-parser. It's working like a charm! Accessing character position easily by 'range' attribute

const scripts = parse(code).getElementsByTagName('script') const pureCode = code.slice(scripts[0].range[0], scripts[0].range[1]);

Collectives™ on Stack Overflow

How to parse an html document into an AST that includes line numbers for each node?

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related