I'm trying to pass big strings of random html through regular expressions and my Python 2.6 script is choking on this:
UnicodeEncodeError: 'ascii' codec can't encode character
I traced it back to a trademark superscript on the end of this word: Protection™ -- and I expect to encounter others like it in the future.
Is there a module to process non-ascii characters? or, what is the best way to handle/escape non-ascii stuff in python?
Thanks! Full error:
E ====================================================================== ERROR: test_untitled (__main__.Untitled) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\Test2.py", line 26, in test_untitled ofile.write(Whois + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\u2122' in position 1005: ordinal not in range(128) Full Script:
from selenium import selenium import unittest, time, re, csv, logging class Untitled(unittest.TestCase): def setUp(self): self.verificationErrors = [] self.selenium = selenium("localhost", 4444, "*firefox", "http://www.BaseDomain.com/") self.selenium.start() self.selenium.set_timeout("90000") def test_untitled(self): sel = self.selenium spamReader = csv.reader(open('SubDomainList.csv', 'rb')) for row in spamReader: sel.open(row[0]) time.sleep(10) Test = sel.get_text("//html/body/div/table/tbody/tr/td/form/div/table/tbody/tr[7]/td") Test = Test.replace(",","") Test = Test.replace("\n", "") ofile = open('TestOut.csv', 'ab') ofile.write(Test + '\n') ofile.close() def tearDown(self): self.selenium.stop() self.assertEqual([], self.verificationErrors) if __name__ == "__main__": unittest.main()