1

I have seen many topics opened on this subject, yet none of them helped me solve the issue. I have a dataset containing text with lots of different characters. Therefore, I encode the text before I make a POST request using Requests library on Python 2.7.13.

My code is the following:

# -*- coding: utf-8 -*- # encoding=utf8 import sys reload(sys) sys.setdefaultencoding('utf8') import json import requests text = """So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST""" textX = json.dumps({'text': text.encode('utf-8')}) r = requests.post('http://####', data=textX, headers={'Content-Type': 'application/json; charset=UTF-8'}) print(r.text) 

The data is sent in JSON format. No matter where I try to encode the text as UTF-8, I'm still getting the following error from Requests.

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2764' in position 42: Body ('❤') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8. 

Edit: Syntax error fixed, but not the cause of the problem

2
  • aren't you missing a closing ) in textX = json.dumps({'text': text.encode('utf-8'}) ? Should be causing a syntax error... Can you post the rest of the traceback: what line is generating the UnicodeError? Commented Aug 1, 2017 at 19:47
  • @cowbert, you are right, there was a syntax error, but I got the Unicode error even when the syntax is correct. Commented Aug 2, 2017 at 6:22

1 Answer 1

1

The default for json.dumps is to generate an ASCII-only string, which eliminates encoding problems. The error is not using a Unicode string. Make sure to save the source file in the encoding declared (#coding=utf8):

# coding=utf8 import json text = u"""So happy to be together on your birthday! ❤ Thankful for real life. ❤ A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST""" textX = json.dumps({u'text': text}) 

Output:

'{"text": "So happy to be together on your birthday! \\u2764 Thankful for real life. \\u2764 A post shared by Jessica Chastain (@jessicachastain) on Nov 13, 2016 at 5:22am PST"}' 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @mark-tolonen from your answer. Although, I get the same output as you do, I still get the following error when I try to post textX UnicodeEncodeError: 'latin-1' codec can't encode character '\u2764' in position 42: Body ('❤') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
@Furkanicus Actually my formatting was incorrect on the output, which suppressed the actual content. It's fixed now There isn't a \u2764 character, but a literal escape code '\\u2764' so there wouldn't be a Unicode character in it. So that seems like a bug somewhere else. You aren't specifying latin-1 anywhere. JSON can be transmitted with escape codes and it will still decode on the other end of the connection when the server processes it.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.