How to solve encoding error in Python

Question

I want to scrape some contents from a webpage, this is the code:

import requests from bs4 import BeautifulSoup import urllib2 url = "anUrl" r = requests.get(url) soup = BeautifulSoup(r.text,'lxml') print soup.prettify()

This is the error description: unicodeencodeerror: 'charmap' codec can't encode character u'\u2013' in position :character maps to undefined

This kind of error should depends about different characters, not ever the same, so i need a generic solution.

What are you using for a console, i.e. where is the print output going? — Mark Ransom
– Mark Ransom, Commented Oct 15, 2015 at 15:14
I'm printing it on command line, but i need to display it on a browser. — Poggio
– Poggio, Commented Oct 15, 2015 at 15:15
But is it Windows, Linux, or something else? And if you put it on a browser you won't be using print anymore, correct? — Mark Ransom
– Mark Ransom, Commented Oct 15, 2015 at 15:17
Windows. Yes, i'm trying with some test in command line, then i will change the output. — Poggio
– Poggio, Commented Oct 15, 2015 at 15:20

Community · Accepted Answer · 2017-05-23 12:14:42Z

2

I think you have the same problem : UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 3 2: ordinal not in range(128)

So you can use u'\u2013'.encode('utf8') :) (to be more specific, use soup.prettify().encode('utf8'))

Or switch to Python 3 ;)

edited May 23, 2017 at 12:14

CommunityBot

11 silver badge

answered Oct 15, 2015 at 15:17

Labo

2,8005 gold badges22 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Poggio Over a year ago

I've still watched at that answer, i'm forced to use Python 2.*, but i don know where to put u'\u2013'.encode('utf8') in my code.

EsseTi Over a year ago

should be r.text.encode('utf8') or r.content.encode('utf8') i don't know where exactly you get the error

xnx Over a year ago

You don't say exactly where you're getting your error, but from your description it sounds like you might need to properly encode the pretty soup going out to the terminal with: print soup.prettify().encode('utf8').

Mark Ransom · Accepted Answer · 2015-10-15 15:46:40Z

To fix the print command, you can explicitly encode the output. You have many different choices depending on how you want to treat Unicode characters.

If you simply want to eliminate any characters that aren't supported by your console:

print soup.prettify().encode(sys.stdout.encoding, 'ignore')

If you want to replace characters that aren't supported with a placeholder character (typically a question mark):

print soup.prettify().encode(sys.stdout.encoding, 'replace')

If you want to show any non-ASCII characters as an escape sequence:

print soup.prettify().encode('raw_unicode_escape')

When you're ready to write to HTML output, you should encode it consistently to the encoding that your web page will use, preferably UTF-8.

f.write(soup.prettify().encode('utf-8'))

Do you know how to print in browser the py script output trough javascript? In a previous python script i've used this: print "Content-type: text\n\n" but in that case i was not using BeautifulSoup, so now i'm not able to pass an useful object to the js script.
@Poggio sorry, I haven't yet used Python to output a web page so it's outside of my area of expertise.

Collectives™ on Stack Overflow

How to solve encoding error in Python

2 Answers 2

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Linked

Related