I'm trying to parse web page to save some data from it in excel or csv file.
import urllib.request import xml.etree.ElementTree as ET url = "http://rusdrama.com/afisha" response = urllib.request.urlopen(url) content = response.read() root = ET.fromstring(content) When parsing page using fromstring method ElementTree I got the following error:
Traceback (most recent call last): File "D:/PythonProjects/PythonMisc/theater_reader.py", line 7, in <module> root = ET.fromstring(content) File "D:\Python\Python35\lib\xml\etree\ElementTree.py", line 1333, in XML parser.feed(text) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 49, column 14 The part of received page is the following:
<script> jQuery(document).ready(function(){ jQuery(window).scroll(function() { var scroll = jQuery(window).scrollTop(); if (scroll >= 100) { jQuery(".t3-header").addClass("solid"); } if (scroll <= 100) { jQuery(".t3-header").removeClass("solid"); } }); }) </script> And specifically line 49:
if (scroll <= 100) { So the problem is in opening angle bracket that seems to be processed as opening tag symbol. I saw several similar questions but can't understand how to handle this situation.
<,>and&to be escaped.