Thanks for your help in advance. I am trying to read a JSON file into a pandas DataFrane and getting a cornucopia of unicode/ascii errors. Edit: The error appears to lie in the fact that the JSON file is multi line with each line its own JSON object.
With a data file that looks like:
"data.json" = {"_i":{"$o":"5b"},"c_id":"10","p_id":"10","c_c":2,"l_c":59,"u":{"n":"J","id":"1"},"c_t":"2010","m":"Hopefully \n\nEDIT: Actually."} {"_i":{"$o":"5b"},"p_id":"10","c_id":"10","p_id":"10","c_c":0,"l_c":8,"u":{"n":"S","id":"1"},"c_t":"2010","m":"in-laws?"} Edit: In response to a comment, the above is not code to be run, it is included as a sample of my datafile, that is saved as a json file.
As this is a multiple line file, per this link Loading a file with more than one line of JSON into Python's Pandas I tried to use
import pandas df = pandas.read_json('data.json', lines = True) Gives the error:
json = u'[' + u','.join(lines) + u']' UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 436: ordinal not in range(128) According to this issue highlighted on GitHub https://github.com/pandas-dev/pandas/issues/15132, this is because:
This can happen in Python 2.7 if the default encoding is set to ascii (check sys.getdefaultencoding()). StringIO will convert the input string to ascii when lines=True, resulting in a UnicodeDecodeError because of mixing utf-8 and ascii strings.
Their solution is to change the system encoding to utf-8 from ascii, however, I understand that this is inadvisable - source:Changing default encoding of Python?.
I also tried changing the encoding both to utf-8 / ascii within read_json() but to no avail.
How can I successfully read this json file into a pandas DataFrame, preserving the multi-line structure?
Many thanks!
"With a data file that looks like:"Apparently not. Clearly, one is not able to copy and past this as a Json file. It was instead provided so that, if someone wanted to test my code, they would be able to save this data and reproduce my error. Apparently I didn't make this clear enough and should have been more verbose - I have edited this information into the question. Thanks for your suggestion.