Skip to content

fix parse html RecursionError#486

Open
521xueweihan wants to merge 1 commit intopsf:masterfrom
521xueweihan:master
Open

fix parse html RecursionError#486
521xueweihan wants to merge 1 commit intopsf:masterfrom
521xueweihan:master

Conversation

@521xueweihan
Copy link

@521xueweihan 521xueweihan commented Oct 20, 2021

fix parse html

https://db-engines.com/en/ranking

RecursionError

fix parse html RecursionError
@surister
Copy link
Member

surister commented Feb 26, 2023

Reproduce:

Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from requests_html import HTMLSession >>> session = HTMLSession() >>> p = session.get('https://db-engines.com/en/ranking') >>> p.html.text Traceback (most recent call last): File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 33, in fromstring return _parse(data, beautifulsoup, makeelement, **bsargs) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 79, in _parse root = _convert_tree(tree, makeelement) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 152, in _convert_tree res_root = convert_node(html_root) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 216, in convert_node return handler(bs_node, parent) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag handler(child, res) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag handler(child, res) File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 255, in convert_tag handler(child, res) [Previous line repeated 985 more times] File "/usr/lib/python3.10/site-packages/lxml/html/soupparser.py", line 242, in convert_tag res = etree.SubElement(parent, bs_node.name, attrib=attribs) File "src/lxml/etree.pyx", line 3156, in lxml.etree.SubElement File "src/lxml/apihelpers.pxi", line 199, in lxml.etree._makeSubElement File "src/lxml/apihelpers.pxi", line 195, in lxml.etree._makeSubElement File "src/lxml/etree.pyx", line 1630, in lxml.etree._elementFactory File "src/lxml/classlookup.pxi", line 403, in lxml.etree._parser_class_lookup File "src/lxml/classlookup.pxi", line 456, in lxml.etree._custom_class_lookup File "/usr/lib/python3.10/site-packages/lxml/html/__init__.py", line 734, in lookup if node_type == 'element': RecursionError: maximum recursion depth exceeded in comparison >>>
@surister
Copy link
Member

@521xueweihan

I'd love to see a test for this and perhaps the proposed fix could be slightly refactored since we could do

try: ... except (Exception1, Exception2): pass

I reckon it's being a couple of years, I might understand that you are no longer interested nor active in this repo, In a few days I will do it myself, I will reference this PR to try give you some credit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants