My goal: get the page source from a url and count all instances of a keyword within that page source
How I am doing it: getting the pagesource via urllib2, looping through each char of the page source and comparing it to the keyword
My problem: my keyword is encoded in utf-8 while the page source is in ascii... I am running into errors whenever I try conversions.
getting the page source:
import urllib2 response = urllib2.urlopen(myUrl) return response.read() comparing page source and keyword:
pageSource[i] == keyWord[j] I need to convert one of these strings to the other's encoding. Intuitively I felt that ascii (the page source) to utf-8 (the key word) would be the best and easiest, so:
pageSource = unicode(pageSource) UnicodeDecodeError: 'ascii' codec can't decode byte __ in position __: ordinal not in range(128)
0x41, which is the same as UTF-8