nested dictionary comprehension

Question

For the following nested dictionary I would like to sum values for each 'ab', 'bc', 'cd', 'de' keys respectively. Basically, collapse the dictionary. Preferably, using comprehension with =sum but cannot figure out the proper syntax:

{'hot': {'111': {'ab': 1, 'bc': 3, 'cd': 5, 'de': 7}}} {'hot': {'111': {'ab': 12.5, 'bc': -31, 'cd': 2.5, 'de': 13}}} {'hot': {'111': {'ab': 10, 'bc': 3, 'cd': 0, 'de': -2}}} {'hot': {'110': {'ab': -1, 'bc': 0, 'cd': 1, 'de': 1}}} {'hot': {'110': {'ab': 8, 'bc': 20, 'cd': 41, 'de': 13}}} {'hot': {'110': {'ab': 1.75, 'bc': 2.3, 'cd': 6, 'de': 0}}} {'hot': {'109': {'ab': 2.7, 'bc': 24, 'cd': 4, 'de': 5}}} {'hot': {'109': {'ab': 41, 'bc': 6, 'cd': 12, 'de': 33}}} {'hot': {'109': {'ab': 32, 'bc': 7, 'cd': 18, 'de': 3.75}}} {'cold': {'111': {'ab': 25, 'bc': 2, 'cd': 3, 'de': 2.1}}} {'cold': {'111': {'ab': 5, 'bc': 8, 'cd': 5, 'de': 17}}} {'cold': {'111': {'ab': -71, 'bc': 42, 'cd': 5, 'de': 16}}} {'cold': {'110': {'ab': 23, 'bc': 2.4, 'cd': 2.1, 'de': 4.3}}} {'cold': {'110': {'ab': 11, 'bc': 2.8, 'cd': 4.5, 'de': 2.4}}} {'cold': {'110': {'ab': 4, 'bc': 5.7, 'cd': 8.7, 'de': 1}}}

Desired output:

dict['hot']['111'][AB] = 1 + 12.5 + 10 = 23.5 dict['hot']['111'][BC] = 3 - 31 + 3 = - 25

etc

You want all of the 'ab' to be in one sum, or you want to sum all of the 'ab' corresponding to each respective key path in the dictionary(like in your desired output)? — Easton Bornemeier
– Easton Bornemeier, Commented Jun 23, 2017 at 21:24
It's not clear what your input is. Are you saying you've got multiple dicts with similar keys, and you want to sum the values grouped by key? — OldGeeksGuide
– OldGeeksGuide, Commented Jun 23, 2017 at 21:25

cs95 · Accepted Answer · 2017-06-23 21:37:08Z

I assume your data is in a list, because with this, you get the answers you expect.

data = [{'hot': {'111': {'ab': 1, 'bc': 3, 'cd': 5, 'de': 7}}}, {'hot': {'111': {'ab': 12.5, 'bc': -31, 'cd': 2.5, 'de': 13}}}, {'hot': {'111': {'ab': 10, 'bc': 3, 'cd': 0, 'de': -2}}}, {'hot': {'110': {'ab': -1, 'bc': 0, 'cd': 1, 'de': 1}}}, {'hot': {'110': {'ab': 8, 'bc': 20, 'cd': 41, 'de': 13}}}, {'hot': {'110': {'ab': 1.75, 'bc': 2.3, 'cd': 6, 'de': 0}}}, {'hot': {'109': {'ab': 2.7, 'bc': 24, 'cd': 4, 'de': 5}}}, {'hot': {'109': {'ab': 41, 'bc': 6, 'cd': 12, 'de': 33}}}, {'hot': {'109': {'ab': 32, 'bc': 7, 'cd': 18, 'de': 3.75}}}, {'cold': {'111': {'ab': 25, 'bc': 2, 'cd': 3, 'de': 2.1}}}, {'cold': {'111': {'ab': 5, 'bc': 8, 'cd': 5, 'de': 17}}}, {'cold': {'111': {'ab': -71, 'bc': 42, 'cd': 5, 'de': 16}}}, {'cold': {'110': {'ab': 23, 'bc': 2.4, 'cd': 2.1, 'de': 4.3}}}, {'cold': {'110': {'ab': 11, 'bc': 2.8, 'cd': 4.5, 'de': 2.4}}}, {'cold': {'110': {'ab': 4, 'bc': 5.7, 'cd': 8.7, 'de': 1}}} ]

And the code is this:

from collections import defaultdict counts = defaultdict(lambda: defaultdict(lambda: defaultdict(int))) for d in data: # for the list for k1 in d: # for the hot-cold level for k2 in d[k1]: # for the 1[0-9]{2} level for k3 in d[k1][k2]: # for the [a-z]{2} level counts[k1][k2][k3] += d[k1][k2][k3] print(counts['hot']['111']['ab']) print(counts['hot']['111']['bc'])

There are 2 levels of defaultdict nesting.

Output:

23.5 -25

You shouldn't have to use imports for this. The guy is clearly new to this type of thing in Python, and it's mainly about organizing the data. Your example has too many loops, extra overhead.
You need one loop per depth of nesting. That's just the nature of OP's data.
@spikespaz well, you could just use machine code, then you don't even have to use Python!
@juanpa.arrivillaga Ok, I get it. It's Python, it's slow no matter what. But even if it is a slow language, there's no reason to make it work even slower, is there?
@spikespaz what? No. My point was that saying "You shouldn't have to use imports for this" is bad advice. The best advice you can give to someone new to programming is "don't reinvent the wheel."

Jacob Birkett · Accepted Answer · 2017-06-23 21:45:24Z

This example is making a "getter" function. It might reduce a little overhead compared to parsing the whole dict list at once.

The double dictionary iteration here can be reduced by simply parsing the accepted dictionaries within the first iteration, however it is separated into accepted with a second iterator for demonstration purposes.

Here is a complete code example which prints out the desired result, 23.5. First, create a list of the dictionaries you want to read from:

dictionaries = [ {'hot': {'111': {'ab': 1, 'bc': 3, 'cd': 5, 'de': 7}}}, {'hot': {'111': {'ab': 12.5, 'bc': -31, 'cd': 2.5, 'de': 13}}}, {'hot': {'111': {'ab': 10, 'bc': 3, 'cd': 0, 'de': -2}}}, {'hot': {'110': {'ab': -1, 'bc': 0, 'cd': 1, 'de': 1}}}, {'hot': {'110': {'ab': 8, 'bc': 20, 'cd': 41, 'de': 13}}}, {'hot': {'110': {'ab': 1.75, 'bc': 2.3, 'cd': 6, 'de': 0}}}, {'hot': {'109': {'ab': 2.7, 'bc': 24, 'cd': 4, 'de': 5}}}, {'hot': {'109': {'ab': 41, 'bc': 6, 'cd': 12, 'de': 33}}}, {'hot': {'109': {'ab': 32, 'bc': 7, 'cd': 18, 'de': 3.75}}}, {'cold': {'111': {'ab': 25, 'bc': 2, 'cd': 3, 'de': 2.1}}}, {'cold': {'111': {'ab': 5, 'bc': 8, 'cd': 5, 'de': 17}}}, {'cold': {'111': {'ab': -71, 'bc': 42, 'cd': 5, 'de': 16}}}, {'cold': {'110': {'ab': 23, 'bc': 2.4, 'cd': 2.1, 'de': 4.3}}}, {'cold': {'110': {'ab': 11, 'bc': 2.8, 'cd': 4.5, 'de': 2.4}}}, {'cold': {'110': {'ab': 4, 'bc': 5.7, 'cd': 8.7, 'de': 1}}} ]

Next, make your function.

def get_sum(temp, num, pt): accepted = [] # Initialize a list of accepted dictionaries that fit the arguments passed. pt_sum = 0 # Initialize the variable for the sum of your parts, starting at 0. for dictionary in dictionaries: # Iterate through the dictionary list. if temp in dictionary and num in dictionary[temp]: # Check if the dict on current iteration has what you want. accepted.append(dictionary[temp][num]) # It does, so add it to accepted. # Let's pause here. Say you are reading the first dict in the list. So that means, this is what the fuction is working with: # {'hot': {'111': {'ab': 1, 'bc': 3, 'cd': 5, 'de': 7}}} # Now with the append function, we are calling "dictionary[temp][num]". # We know that each of these keys exist, because we just checked it. # So this eliminates the need to add the whole dictionary to "accepted". # Basically, we are cutting out the last section, because that's what we need. So you end up with: # "{'ab': 1, 'bc': 3, 'cd': 5, 'de': 7}" in the list "accepted". for dictionary in accepted: # Now go through the ones that have the data you need. pt_sum += dictionary[pt] # And simply add the value to the sum. return pt_sum # Return the part sum.

And now you can use it:

print(get_sum("hot", "111", "ab")) >>> 23.5

The simplified code that I mentioned at the top would be this:

def get_sum(temp, num, pt): pt_sum = 0 for dictionary in dictionaries: if temp in dictionary and num in dictionary[temp]: pt_sum += dictionary[temp][num][pt] return pt_sum

Essentially just adding to pt_sum in the first loop, so there is no second iteration, which was never required.

It's not like you've removed the loops at all. It's just that you're fetching the sum for a single combination. Now try fetching them all.
@Coldspeed That wasn't the question. From the way he said he wants to use it, dict['hot']['111'][AB] looks like fetching values based on that he already knows. I have him a function that does that. It's basically the same thing, except with parenthesis, quotations, and commas instead of square brackets.
Sorry, the data comes from ElementTree for loop, reading from parsed xml. I just did not want to overburden the question. Basically trying to store parts of xml into a nested dictionary, perform some calculations and and save as new xml.
@Vrun Could I see the output from the parser in a pastebin or something?

Collectives™ on Stack Overflow

nested dictionary comprehension

2 Answers 2

8 Comments

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

4 Comments

Related