A fast HTML5 parser and CSS selectors using Modest engine.
From PyPI using pip:
pip install selectolaxDevelopment version from github:
git clone --recursive https://github.com/rushter/selectolax cd selectolax pip install -r requirements_dev.txt python setup.py installHow to compile selectolax while developing:
make clean make devfrom selectolax.parser import HTMLParser html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>" selector = "div > :nth-child(2n+1):not(:has(a))" for node in HTMLParser(html).css(selector): print(node.attributes, node.text(), node.tag) print(node.parent.tag) print(node.html)- Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
| Package | Time | Memory (peak) |
|---|---|---|
| selectolax | 2.38 sec. | 768.11 MB |
| lxml | 18.67 sec. | 769.21 MB |