0

I am teaching myself how to parse google results with json, but when I run this code ( which shoud work ), I am getting this error: UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in position 5: character maps to <undefined>. Can someone help me?

import urllib import simplejson query = urllib.urlencode({'q' : 'site:example.com'}) url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s&start=50' \ % (query) search_results = urllib.urlopen(url) json = simplejson.loads(search_results.read()) results = json['responseData']['results'] for i in results: print i['title'] + ": " + i['url'] 
1
  • 2
    Can you include a traceback so we can pinpoint the source of the problem? Commented Sep 6, 2012 at 22:07

3 Answers 3

1

This error may be caused by the encoding that your console application uses when sending unicode data to stdout. There's an article that talks about it.

Check stdout's encoding:

>>> import sys >>> sys.stdout.encoding # On my machine I get this result: 'UTF-8' 
Sign up to request clarification or add additional context in comments.

Comments

0

Use unicode literals.

print i[u'title'] + u": " + i[u'url'] 

Also:

jsondata = simplejson.load(search_results) 

2 Comments

Well i changed print line from print i['title'] + ": " + i['url'] to print i[u'title'] + u": " + i[u'url'] but nothing happend. What you mean with jsondata = simplejson.load(search_results)?
@user1505497: No need to do .loads(search_results.read()) because the .load() function (no 's') does the reading for you (more efficiently).
0

My guess is that the error is in simplejson.loads(search_results.read()) line, possibly because the default encoding your python is picking up is not utf-8 and google is returning utf-8.

Try: simplejson.loads(unicode(search_results.read(), "utf8").

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.