Skip to content

ra2003/selectolax

 
 

Repository files navigation

selectolax

A fast HTML5 parser and CSS selectors using Modest engine.

Installation

From PyPI using pip:

pip install selectolax

Development version from github:

git clone --recursive https://github.com/rushter/selectolax cd selectolax pip install -r requirements_dev.txt python setup.py install

How to compile selectolax while developing:

make clean make dev

Examples

from selectolax.parser import HTMLParser html = "<div><p id=p1><p id=p2><p id=p3><a>link</a><p id=p4><p id=p5>text<p id=p6></div>" selector = "div > :nth-child(2n+1):not(:has(a))" for node in HTMLParser(html).css(selector): print(node.attributes, node.text(), node.tag) print(node.parent.tag) print(node.html)

Simple Benchmark

  • Average of 10 experiments to parse and retrieve URLs from 800 Google SERP pages.
Package Time Memory (peak)
selectolax 2.38 sec. 768.11 MB
lxml 18.67 sec. 769.21 MB

Links

License

About

Python binding to Modest engine (fast HTML5 parser with CSS selectors).

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages

  • Python 96.6%
  • Makefile 3.4%