2

I'm trying to parse web page to save some data from it in excel or csv file.

import urllib.request import xml.etree.ElementTree as ET url = "http://rusdrama.com/afisha" response = urllib.request.urlopen(url) content = response.read() root = ET.fromstring(content) 

When parsing page using fromstring method ElementTree I got the following error:

Traceback (most recent call last): File "D:/PythonProjects/PythonMisc/theater_reader.py", line 7, in <module> root = ET.fromstring(content) File "D:\Python\Python35\lib\xml\etree\ElementTree.py", line 1333, in XML parser.feed(text) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 49, column 14 

The part of received page is the following:

 <script> jQuery(document).ready(function(){ jQuery(window).scroll(function() { var scroll = jQuery(window).scrollTop(); if (scroll >= 100) { jQuery(".t3-header").addClass("solid"); } if (scroll <= 100) { jQuery(".t3-header").removeClass("solid"); } }); }) </script> 

And specifically line 49:

 if (scroll <= 100) { 

So the problem is in opening angle bracket that seems to be processed as opening tag symbol. I saw several similar questions but can't understand how to handle this situation.

3
  • 1
    you are opening this with an XML parser. XML requires <, > and & to be escaped. Commented Nov 16, 2016 at 20:28
  • 1
    you may want to use an HTML parser instead. Commented Nov 16, 2016 at 20:29
  • Thank you! I didn't think to use not xml parser) Commented Nov 27, 2016 at 8:53

1 Answer 1

3

You are trying to parse HTML with an XML parser. Use a proper tool, an HTML Parser, instead: BeautifulSoup or lxml.html are the most popular.

Demo:

>>> from bs4 import BeautifulSoup >>> import urllib.request >>> >>> url = "http://rusdrama.com/afisha" >>> response = urllib.request.urlopen(url) >>> >>> soup = BeautifulSoup(response, "html.parser") >>> print(soup.title.get_text()) Афиша Харьковского академического русского драматического театра Пушкина 
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.