2

I want to scrape some contents from a webpage, this is the code:

import requests from bs4 import BeautifulSoup import urllib2 url = "anUrl" r = requests.get(url) soup = BeautifulSoup(r.text,'lxml') print soup.prettify() 

This is the error description: unicodeencodeerror: 'charmap' codec can't encode character u'\u2013' in position :character maps to undefined

This kind of error should depends about different characters, not ever the same, so i need a generic solution.

4
  • What are you using for a console, i.e. where is the print output going? Commented Oct 15, 2015 at 15:14
  • I'm printing it on command line, but i need to display it on a browser. Commented Oct 15, 2015 at 15:15
  • But is it Windows, Linux, or something else? And if you put it on a browser you won't be using print anymore, correct? Commented Oct 15, 2015 at 15:17
  • Windows. Yes, i'm trying with some test in command line, then i will change the output. Commented Oct 15, 2015 at 15:20

2 Answers 2

2

I think you have the same problem : UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128)

So you can use u'\u2013'.encode('utf8') :) (to be more specific, use soup.prettify().encode('utf8'))

Or switch to Python 3 ;)

Sign up to request clarification or add additional context in comments.

3 Comments

I've still watched at that answer, i'm forced to use Python 2.*, but i don know where to put u'\u2013'.encode('utf8') in my code.
should be r.text.encode('utf8') or r.content.encode('utf8') i don't know where exactly you get the error
You don't say exactly where you're getting your error, but from your description it sounds like you might need to properly encode the pretty soup going out to the terminal with: print soup.prettify().encode('utf8').
1

To fix the print command, you can explicitly encode the output. You have many different choices depending on how you want to treat Unicode characters.

If you simply want to eliminate any characters that aren't supported by your console:

print soup.prettify().encode(sys.stdout.encoding, 'ignore') 

If you want to replace characters that aren't supported with a placeholder character (typically a question mark):

print soup.prettify().encode(sys.stdout.encoding, 'replace') 

If you want to show any non-ASCII characters as an escape sequence:

print soup.prettify().encode('raw_unicode_escape') 

When you're ready to write to HTML output, you should encode it consistently to the encoding that your web page will use, preferably UTF-8.

f.write(soup.prettify().encode('utf-8')) 

2 Comments

Do you know how to print in browser the py script output trough javascript? In a previous python script i've used this: print "Content-type: text\n\n" but in that case i was not using BeautifulSoup, so now i'm not able to pass an useful object to the js script.
@Poggio sorry, I haven't yet used Python to output a web page so it's outside of my area of expertise.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.