62

I'm trying to do some simple JSON parsing using Python 3's built in JSON module, and from reading a bunch of other questions on SO and googling, it seems this is supposed to be pretty straightforward. However, I think I'm getting a string returned instead of the expected dictionary.

Firstly, here is the JSON I am trying to get values from. It's just some output from Twitter's API

[{'in_reply_to_status_id_str': None, 'in_reply_to_screen_name': None, 'retweeted': False, 'in_reply_to_status_id': None, 'contributors': None, 'favorite_count': 0, 'in_reply_to_user_id': None, 'coordinates': None, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'geo': None, 'retweet_count': 0, 'text': 'Tweeting a url \nhttp://t.co/QDVYv6bV90', 'created_at': 'Mon Sep 01 19:36:25 +0000 2014', 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://www.isthereanappthat.com', 'display_url': 'isthereanappthat.com', 'url': 'http://t.co/QDVYv6bV90', 'indices': [16, 38]}], 'hashtags': []}, 'id_str': '506526005943865344', 'in_reply_to_user_id_str': None, 'truncated': False, 'favorited': False, 'lang': 'en', 'possibly_sensitive': False, 'id': 506526005943865344, 'user': {'profile_text_color': '333333', 'time_zone': None, 'entities': {'description': {'urls': []}}, 'url': None, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'protected': False, 'default_profile_image': True, 'utc_offset': None, 'default_profile': True, 'screen_name': 'KickzWatch', 'follow_request_sent': False, 'following': False, 'profile_background_color': 'C0DEED', 'notifications': False, 'description': '', 'profile_sidebar_border_color': 'C0DEED', 'geo_enabled': False, 'verified': False, 'friends_count': 40, 'created_at': 'Mon Sep 01 16:29:18 +0000 2014', 'is_translator': False, 'profile_sidebar_fill_color': 'DDEEF6', 'statuses_count': 4, 'location': '', 'id_str': '2784389341', 'followers_count': 4, 'favourites_count': 0, 'contributors_enabled': False, 'is_translation_enabled': False, 'lang': 'en', 'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'id': 2784389341, 'profile_use_background_image': True, 'listed_count': 0, 'profile_background_tile': False, 'name': 'Maktub Destiny', 'profile_link_color': '0084B4'}, 'place': None}] 

I assigned this String to a variable named json_string like so:

json_string = json.dumps(output) jason = json.loads(json_string) 

Then, when I try to get a specific key from the "jason" dictionary:

print(jason['hashtags']) 

I'm getting an error:

TypeError: string indices must be integers 

I want to be able to convert the json output to a dictionary, then use jason[key_name] call to get values using specified keys. Is there something obvious that I'm missing here?

This is my fist time working with Python, after coming from Java. I absolutely love the language and think it's very powerful. So, any help on this would be greatly appreciated!

11
  • 2
    1) That data you pasted is a Python data structure, not JSON. 2) The outer data structure is a list, not a dictionary. Commented Sep 1, 2014 at 22:32
  • @LukasGraf Hmmm, interesting. So it is a list containing a dictionary? I just commented out the json logic and just tried output[0]['hashtags'] with no luck. "output" in this case being the Python data structure returned from the call. Any thoughts on how to approach this? Commented Sep 1, 2014 at 22:37
  • 3
    As others pointed out, your JSON input will become a list in Python, not a dict . Also, the code snippet you gave, print(jason['hashtags'], is not even valid Python due to the lack of a closing parenthesis. Please post a syntactically correct example with its output so we can be sure what code is producing what error. Commented Sep 1, 2014 at 22:43
  • I get a different error: TypeError: list indices must be integers, not str. And that is what I would expect: the resulting object is a list, not a string. >>> import json >>> json_string = json.dumps(output) >>> jason = json.loads(json_string) >>> print(jason['hashtags']) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: list indices must be integers, not str Commented Sep 1, 2014 at 22:52
  • 1
    This is also not even valid JSON, which allows only double-quoted strings, and not single-quoted strings. Commented Sep 1, 2014 at 22:53

6 Answers 6

30

I did json.loads(json.loads(string)) and was able to get the dictionary. You can check it out. The first time it doesn't just return the same string, but processes it (e.g. removes \\ characters).

Sign up to request clarification or add additional context in comments.

1 Comment

This works! The response was sending a string instead of json so it's working now.
24

Ok first you should print your object so that you can read it:

>>> from pprint import pprint >>> output = [{'in_reply_to_status_id_str': None, 'in_reply_to_screen_name': None, 'retweeted': False, 'in_reply_to_status_id': None, 'contributors': None, 'favorite_count': 0, 'in_reply_to_user_id': None, 'coordinates': None, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'geo': None, 'retweet_count': 0, 'text': 'Tweeting a url \nhttp://t.co/QDVYv6bV90', 'created_at': 'Mon Sep 01 19:36:25 +0000 2014', 'entities': {'symbols': [], 'user_mentions': [], 'urls': [{'expanded_url': 'http://www.isthereanappthat.com', 'display_url': 'isthereanappthat.com', 'url': 'http://t.co/QDVYv6bV90', 'indices': [16, 38]}], 'hashtags': []}, 'id_str': '506526005943865344', 'in_reply_to_user_id_str': None, 'truncated': False, 'favorited': False, 'lang': 'en', 'possibly_sensitive': False, 'id': 506526005943865344, 'user': {'profile_text_color': '333333', 'time_zone': None, 'entities': {'description': {'urls': []}}, 'url': None, 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'protected': False, 'default_profile_image': True, 'utc_offset': None, 'default_profile': True, 'screen_name': 'KickzWatch', 'follow_request_sent': False, 'following': False, 'profile_background_color': 'C0DEED', 'notifications': False, 'description': '', 'profile_sidebar_border_color': 'C0DEED', 'geo_enabled': False, 'verified': False, 'friends_count': 40, 'created_at': 'Mon Sep 01 16:29:18 +0000 2014', 'is_translator': False, 'profile_sidebar_fill_color': 'DDEEF6', 'statuses_count': 4, 'location': '', 'id_str': '2784389341', 'followers_count': 4, 'favourites_count': 0, 'contributors_enabled': False, 'is_translation_enabled': False, 'lang': 'en', 'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'id': 2784389341, 'profile_use_background_image': True, 'listed_count': 0, 'profile_background_tile': False, 'name': 'Maktub Destiny', 'profile_link_color': '0084B4'}, 'place': None}] >>> pprint(output) [{'contributors': None, 'coordinates': None, 'created_at': 'Mon Sep 01 19:36:25 +0000 2014', 'entities': {'hashtags': [], 'symbols': [], 'urls': [{'display_url': 'isthereanappthat.com', 'expanded_url': 'http://www.isthereanappthat.com', 'indices': [16, 38], 'url': 'http://t.co/QDVYv6bV90'}], 'user_mentions': []}, 'favorite_count': 0, 'favorited': False, 'geo': None, 'id': 506526005943865344, 'id_str': '506526005943865344', 'in_reply_to_screen_name': None, 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'lang': 'en', 'place': None, 'possibly_sensitive': False, 'retweet_count': 0, 'retweeted': False, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'text': 'Tweeting a url \nhttp://t.co/QDVYv6bV90', 'truncated': False, 'user': {'contributors_enabled': False, 'created_at': 'Mon Sep 01 16:29:18 +0000 2014', 'default_profile': True, 'default_profile_image': True, 'description': '', 'entities': {'description': {'urls': []}}, 'favourites_count': 0, 'follow_request_sent': False, 'followers_count': 4, 'following': False, 'friends_count': 40, 'geo_enabled': False, 'id': 2784389341, 'id_str': '2784389341', 'is_translation_enabled': False, 'is_translator': False, 'lang': 'en', 'listed_count': 0, 'location': '', 'name': 'Maktub Destiny', 'notifications': False, 'profile_background_color': 'C0DEED', 'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png', 'profile_background_tile': False, 'profile_image_url': 'http://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'profile_image_url_https': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_6_normal.png', 'profile_link_color': '0084B4', 'profile_sidebar_border_color': 'C0DEED', 'profile_sidebar_fill_color': 'DDEEF6', 'profile_text_color': '333333', 'profile_use_background_image': True, 'protected': False, 'screen_name': 'KickzWatch', 'statuses_count': 4, 'time_zone': None, 'url': None, 'utc_offset': None, 'verified': False}}] 

From looking at this you can see that output is a list which contains a single dict. To access this you need:

>>> first_elem = output[0] 

You will also see that the hashtags key in the first_elem is contained in a second level dict under the key entities:

>>> entities = first_elem['entities'] >>> pprint(entities) {'hashtags': [], 'symbols': [], 'urls': [{'display_url': 'isthereanappthat.com', 'expanded_url': 'http://www.isthereanappthat.com', 'indices': [16, 38], 'url': 'http://t.co/QDVYv6bV90'}], 'user_mentions': []} 

Now you are able to access hashtags:

>>> entities['hashtags'] [] 

Which just happens to be the empty list.

To convert to JSON, note the comment:

>>> import json >>> # Make sure output is the list object not a string representing the object >>> json_string = json.dumps(output) >>> jason = json.loads(output) >>> jason[0]['entities']['hashtags'] [] 

I think your problem is that you made output a string before you json.dumps it, meaning that json.loads will return a string, not a json object.

And @Dan's answer is correct, this is not valid JSON. It is however a valid python dict, and I'm assuming that you got it from Twitter using python then printed it.

4 Comments

His issue rather was that he encoded a result that was already Python back to JSON, and then back to Python again, and then hit another unrelated problem ;-)
I'll revert my edit to the question - your answer makes much more sense if the question stays in its original form.
@Lukas ... well if you wanna get technical... :P
In short, dict --> json_dumps() --> json_loads() will return you dict. string--> json_dumps() --> json_loads() will return you string. What you give is what you receive. This is Karma.
9

First off, your JSON example is not valid JSON; the Twitter API would not output this, because it would break every conforming JSON consumer.

  • jsonlint shows the first, obvious syntax error: single-quoted rather than double quoted strings.
  • Secondly, you have None where JSON requires null, False instead of false, and True, instead of true.

Your alleged "JSON" example appears to have been pre-decoded into Python :). When I use a snippet of real JSON, it works exactly as expected:

import json json_string = r""" [{"actual_json_key":"actual_json_value"}] """ jason = json.loads(json_string) print(jason[0]["actual_json_key"]) 

Comments

0

same thing was happening with me I tried multiple things. When I printed the result in a json file got to know that there were characters like 'u00A0' between my string.

import json x = 'my string which i wanted to convert' x = x.replace("\u00A0","") x = json.dumps(x) z = json.loads(json.loads(x)) 

It worked for me. Due to the character it was giving me the issue. Thanks to dKen answer for the double json.loads part

Comments

0

In my case, the first json.loads() only unescape the json string. In the end I have to do it twice json.loads(json.loads(json_str))

Comments

0

I have a Python app taking json request if I use a say postman to post the json to the service it works

args : dict = json.loads( request.data.decode('utf-8')) 

but when I use the web app browser this does not return a dict it returns a str

to get around that I had to do this

args : dict = json.loads( json.loads( request.data.decode('utf-8') ) ) 

I am thinking going to have to do some RTTI to check the type and optionally do the second json.loads in such sitatuations

This worked:

s = json.loads( request.data.decode('utf-8') ) args : dict = dict() if type( s ) is str : # have to do it like this to get it into a dict not just a str args = json.loads( s ) else : args = s 

2 Comments

While this answer repeats what others have said about calling loads twice, it also adds a solution to automatically detect if that's necessary, thank you for that contribution!
What's the point of the initial dict() assignment?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.