2

I have an ubuntu machine running pythong.2.7.6. When I try using lxml, which has been installed using pip, I get the following error:

Traceback (most recent call last): File "./export.py", line 44, in fetch_item root.append(elem) File "lxml.etree.pyx", line 742, in lxml.etree._Element.append (src/lxml/lxml.etree.c:44339) File "apihelpers.pxi", line 24, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:14127) AssertionError: invalid Element proxy at 140443984439416 

What does this mean, and how should I go about fixing this?

3
  • So what does root.append(elem) do in your code? Where does elem come from? Commented Apr 10, 2015 at 21:09
  • 3
    The traceback tells you that whatever you are trying to append is not a valid node. So we'll need to see your code to ascertain what you are trying to do here and how you can fix this. Commented Apr 10, 2015 at 21:12
  • Could you please answer @MartijnPieters's question above? Do you use multiprocessing module? Commented Oct 5, 2018 at 9:34

1 Answer 1

2

I had the same issue in multiprocessing context. It can be illustrated by the following snippet:

from multiprocessing import Pool import lxml.html def process(html): tree = lxml.html.fromstring(html) body = tree.find('.//body') print(body) return body def main(): pool = Pool() result = pool.apply(process, ('<html><body/></html>',)) print(type(result)) print(result) if __name__ == '__main__': main() 

The result of running it is the following output:

<Element body at 0x7f9f690461d8> <class 'lxml.html.HtmlElement'> Traceback (most recent call last): File "test.py", line 18, in <module> main() File "test.py", line 14, in main print(result) File "src/lxml/lxml.etree.pyx", line 1142, in lxml.etree._Element.__repr__ (src/lxml/lxml.etree.c:54748) File "src/lxml/lxml.etree.pyx", line 992, in lxml.etree._Element.tag.__get__ (src/lxml/lxml.etree.c:53182) File "src/lxml/apihelpers.pxi", line 19, in lxml.etree._assertValidNode (src/lxml/lxml.etree.c:16856) AssertionError: invalid Element proxy at 139697870845496 

Thus most obvious explanation, taking into account that __repr__ works from the worker process and the return value is available to the calling process, is deserialisation issue. It can be solved, for example, by returning lxml.html.tostring(body), or any other pickle-able object.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes, lxml cannot be pickled (as for now) so it cannot be transferred between processes by multiprocessing package. See: bugs.launchpad.net/lxml/+bug/736708

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.