I had the same issue in multiprocessing context. It can be illustrated by the following snippet:
from multiprocessing import Pool import lxml.html def process(html): tree = lxml.html.fromstring(html) body = tree.find('.//body') print(body) return body def main(): pool = Pool() result = pool.apply(process, ('<html><body/></html>',)) print(type(result)) print(result) if __name__ == '__main__': main()
The result of running it is the following output:
<Element body at 0x7f9f690461d8> <class 'lxml.html.HtmlElement'> Traceback (most recent call last): File "test.py", line 18, in <module> main() File "test.py", line 14, in main print(result) File "src/lxml/lxml.etree.pyx", line 1142, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:54748) File "src/lxml/lxml.etree.pyx", line 992, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:53182) File "src/lxml/apihelpers.pxi", line 19, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:16856) AssertionError: invalid Element proxy at 139697870845496
Thus most obvious explanation, taking into account that __repr__ works from the worker process and the return value is available to the calling process, is deserialisation issue. It can be solved, for example, by returning lxml.html.tostring(body), or any other pickle-able object.
root.append(elem)do in your code? Where doeselemcome from?multiprocessingmodule?