I am trying to import JSON data including nested data, into a Pandas DataFrame. It is illustrated here using a nice little public dataset that I found here. Let's focus our attention on the header 'owner' for the purpose of this illustration. Note that I'm explicitly casting the data to str. If you print the response data from response.content it has the proper JSON syntax with double quotes.
However, pd.read_json() seems to convert the JSON string under 'owner' into single quotes. Am I doing something wrong here or should this be raised as a dev issue in read_json()? I see a different issue relating to the single/double quotes has been fixed in the past on pandas-dev.
>>> import pandas as pd >>> import requests >>> response = requests.get(url='https://api.github.com/users/mralexgray/repos') >>> df = pd.read_json(response.content, orient='records', dtype=str) >>> df['owner'].iloc[1, ] "{'login': 'mralexgray', 'id': 262517, 'node_id': 'MDQ6VXNlcjI2MjUxNw==', 'avatar_url': 'https://avatars.githubusercontent.com/u/262517?v=4',..." >>> type(df['owner'].iloc[1, ]) str >>> response.content b'[{"id":6104546,"node_id":"MDEwOlJlcG9zaXRvcnk2MTA0NTQ2","name":"-REPONAME","full_name":"mralexgray/-REPONAME","private":false,"owner":{"login":"mralexgray","id":262517,"node_id":"MDQ6VXNlcjI2MjUxNw==","avatar_url":"https://avatars.githubusercontent.com/u/262517?v=4"... The only observation I make here is that pd.read_json() may be importing the nested JSON data as a Python dict before it is cast to str.
I'm running Python 3.8.10 with Pandas 1.2.4.
{"foo": "bar"}instead of{'foo': 'bar'}. Here's a hint: what happens when you try just{"foo": "bar"}by itself at the Python interpreter prompt? Another hint: what happens when you try{"foo": "bar"} == {'foo': 'bar'}? Or"foo" == 'foo'?dictas in the case of{"foo": "bar"}and{'foo': 'bar'}they are equal. And the same goes for the individual strings'foo'and"foo". However, my issue is with the content in the JSON string."{'foo': 'bar'}"is not the same as"{"foo": "bar"}". This becomes an issue for me later when trying to later usejson_normalize()where I getjson.decoder.JSONDecodeError: Expecting property name enclosed in double quotes:json_normalize()", then you're going to have to show us a minimal, complete example. As is, what you show us doesn't illustrate a problem, even if you say there is one.pd.read_json(dtype=str)to import the JSON string as it is (i.e., with double quotes). Yesjson_normalize()become problematic, but my focus here was to question whether the way pd.read_json() works is correct? Given, IMO it should cast it directly to astrwithout first importing it asdict. And I tried casting tostrbecause expected the double quotes to remain as in the original JSON string.