2

I'm using twitter python library to fetch some tweets from a public stream. The library fetches tweets in json format and converts them to python structures. What I'm trying to do is to directly get the json string and write it to a file. Inside the twitter library it first reads a network socket and applies .decode('utf8') to the buffer. Then, it wraps the info in a python structure and returns it. I can use jsonEncoder to encode it back to the json string and save it to a file. But there is a problem with character encoding I guess. When I try to print the json string it prints fine in the console. But when I try to write it into a file, some characters appear such as \u0627\u0644\u0644\u06be\u064f

I tried to open the saved file using different encodings and nothing has changed. It suppose to be in utf8 encoding and when I try to display it, those special characters should be replaced with actual characters they represent. Am I missing something here? How can I achieve this?

more info:

I'm using python 2.7

I open the file like this:

json_file = open('test.json', 'w')

I also tried this:

json_file = codecs.open( 'test.json', 'w', 'utf-8' )

nothing has changed. I blindly tried, .encode('utf8'), .decode('utf8') on the json string and the result is the same. I tried different text editors to view the written text, I used cat command to see the text in the console and those characters which start with \u still appear.

Update:

I solved the problem. jsonEncoder has an option ensure_ascii

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

I made it False and the problem has gone away.

5
  • What version of python are you using, and can you please include the code you are using to open the file? My guess is you are using Python 2.x and not setting the encoding when you open the file. Commented Nov 30, 2013 at 22:23
  • Did you encode when you write into the file? you'd better to give sample data to show the problem. Commented Dec 1, 2013 at 2:21
  • I used codecs module and encoded the data with utf8 encoding. Commented Dec 1, 2013 at 5:04
  • 3
    Pleas don't put the answer in the question. Post it as an answer instead. Commented Dec 1, 2013 at 14:23
  • It did not allow me to answer my own question because I did not have enough points. It seems like I have enough points now. But somebody already copied mine and posted it as an answer. Funny. what are you doing with all those points? Commented Dec 11, 2013 at 20:09

2 Answers 2

2

jsonEncoder has an option ensure_ascii

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

Make it False and the problem will go away.

Sign up to request clarification or add additional context in comments.

Comments

0

Well, since you won't post your solution as an answer, I will. This question should not be left showing no answer.

jsonEncoder has an option ensure_ascii.

If ensure_ascii is True (the default), all non-ASCII characters in the output are escaped with \uXXXX sequences, and the results are str instances consisting of ASCII characters only.

Make it False and the problem will go away.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.