The fast & forgiving HTML/XML parser, on JSR.io.
htmlparser2 is the fastest HTML parser, and takes some shortcuts to get there. If you need strict HTML spec compliance, have a look at parse5.
| Name | Description |
|---|---|
| htmlparser2 | Fast & forgiving HTML/XML parser |
| domhandler | Handler for htmlparser2 that turns documents into a DOM |
| domutils | Utilities for working with domhandler's DOM |
| css-select | CSS selector engine, compatible with domhandler's DOM |
| cheerio | The jQuery API for domhandler's DOM |
| dom-serializer | Serializer for domhandler's DOM |
htmlparser2 itself provides a callback interface that allows consumption of documents with minimal allocations. For a more ergonomic experience, read Getting a DOM below.
import * as htmlparser2 from "@victr/htmlparser2"; const parser = new htmlparser2.Parser({ onopentag(name, attributes) { /* * This fires when a new tag is opened. * * If you don't need an aggregated `attributes` object, * have a look at the `onopentagname` and `onattribute` events. */ if ( name === "script" && attributes.type === "text/javascript" ) { console.log("JS! Hooray!"); } }, ontext(text) { /* * Fires whenever a section of text was processed. * * Note that this can fire at any point within text and you might * have to stitch together multiple pieces. */ console.log("-->", text); }, onclosetag(tagname) { /* * Fires when a tag is closed. * * You can rely on this event only firing when you have received an * equivalent opening tag before. Closing tags without corresponding * opening tags will be ignored. */ if (tagname === "script") { console.log("That's it?!"); } }, }); parser.write( "Xyz <script type='text/javascript'>const foo = '<<bar>>';</script>", ); parser.end();Output (with multiple text events combined):
--> Xyz JS! Hooray! --> const foo = '<<bar>>'; That's it?! This example only shows three of the possible events. Read more about the parser, its events and options in the wiki.
The DomHandler produces a DOM (document object model) that can be manipulated using the DomUtils helper.
import * as htmlparser2 from "@victr/htmlparser2"; const dom = htmlparser2.parseDocument(htmlString);The DomHandler, while still bundled with this module, was moved to its own module. Have a look at that for further information.
htmlparser2 makes it easy to parse RSS, RDF and Atom feeds, by providing a parseFeed method:
const feed = htmlparser2.parseFeed(content, options);After having some artificial benchmarks for some time, @AndreasMadsen published his htmlparser-benchmark, which benchmarks HTML parses based on real-world websites.
At the time of writing, the latest versions of all supported parsers show the following performance characteristics on GitHub Actions (sourced from here):
htmlparser2 : 2.17215 ms/file ± 3.81587 node-html-parser : 2.35983 ms/file ± 1.54487 html5parser : 2.43468 ms/file ± 2.81501 neutron-html5parser: 2.61356 ms/file ± 1.70324 htmlparser2-dom : 3.09034 ms/file ± 4.77033 html-dom-parser : 3.56804 ms/file ± 5.15621 libxmljs : 4.07490 ms/file ± 2.99869 htmljs-parser : 6.15812 ms/file ± 7.52497 parse5 : 9.70406 ms/file ± 6.74872 htmlparser : 15.0596 ms/file ± 89.0826 html-parser : 28.6282 ms/file ± 22.6652 saxes : 45.7921 ms/file ± 128.691 html5 : 120.844 ms/file ± 153.944