Python 2.7, Requests library, can't get unicode

Question

Documentation for Request library says that requests.get() method returns unicode always. But when I try to know what an encoding was returned, I see "windows-1251". That's a problem. When I try to get requests.get(url).text, there's an error, because current url's content has a Cyrillic symbols.

import requests url = 'https://www.weblancer.net/jobs/' r = requests.get(url) print r.encoding print r.text

I got something like that:

windows-1251 UnicodeEncodeError: 'ascii' codec can't encode characters in position 256-263: ordinal not in range(128)

Is it a problem of Python 2.7 or there is not a problem at all ? Help me

Use .content not .text, also where are you running it from?# — Padraic Cunningham
– Padraic Cunningham, Commented Aug 24, 2016 at 22:03
I believe it's print problem. Python need to convert text to ascii to print it in terminal, but it's imposible — Darth Kotik
– Darth Kotik, Commented Aug 24, 2016 at 22:03
I run it from Sublime Text with anaconda Yes, I've tried to run this code from console, and r.text returned html. But r.encoding still returns "windows-1251". But type(r.text) returns "unicode" It makes me crazy — GolovDanil
– GolovDanil, Commented Aug 24, 2016 at 22:19
When I run the above code, I get all the text. I don't get the error you mention. — GreenAsJade
– GreenAsJade, Commented Aug 24, 2016 at 22:31
Can you provide the full error message, so we can confirm it's coming from where you think it's coming from? — GreenAsJade
– GreenAsJade, Commented Aug 24, 2016 at 22:44

GreenAsJade · Accepted Answer · 2016-08-24 23:16:44Z

From the docs:

Requests will automatically decode content from the server. Most unicode charsets are seamlessly decoded.

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers.

requests.get().encoding is telling you the encoding that was used to convert the bitstream from the server into the Unicode text that is in the response.

In your case it is correct: the headers in the response say that the character set is windows-1251

The error you are having is after that. The python you are using is trying to encode the Unicode into ascii to print it, and failing.

You can say print r.text.encode(r.encoding) ... which is the same result as Padraic's suggestion in comments - that is r.content.

Note: requests.get().encoding is an lvar: you can set it to what you want, if it guessed wrongly.

Collectives™ on Stack Overflow

Python 2.7, Requests library, can't get unicode

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related