8

I'd like to use JavaScript to parse an html document into an abstract syntax tree, where each node also includes start and end line numbers (and hopefully also character positions) for each node. Are there any existing solutions that can do this? I don't want to have to write it myself.

Edit Apr 24, 2016: Being able to parse HTML along with php tags in arbitrary places would be even more ideal.

3
  • Did you find something? Commented Mar 16, 2015 at 17:23
  • Nope. For now I'm just using some regexes that handle most cases until I have time to return and write a real parser myself. Commented Mar 16, 2015 at 21:05
  • 1
    oh ok. I found this: github.com/HenrikJoreteg/html-parse-stringify, its missing line numbers though so I might try and add that if I have time Commented Mar 18, 2015 at 10:15

2 Answers 2

6

https://unifiedjs.github.io/ can get you the CST or AST for a few formats including HTML.

Sign up to request clarification or add additional context in comments.

Comments

0

I used node-html-parser. It's working like a charm! Accessing character position easily by 'range' attribute

const scripts = parse(code).getElementsByTagName('script') const pureCode = code.slice(scripts[0].range[0], scripts[0].range[1]); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.