71

I'm trying to get a list of all keys in a list of dictionaries in order to fill out the fieldnames argument for csv.DictWriter.

previously, I had something like this:

[ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5}, {"name": "Pam", "age": 7} ] 

and I was using fieldnames = list[0].keys() to take the first dictionary in the list and extract its keys.

Now I have something like this where one of the dictionaries has more key:value pairs than the others (could be any of the results). The new keys are added dynamically based on information coming from an API so they may or may not occur in each dictionary and I don't know in advance how many new keys there will be.

[ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "height":4}, {"name": "Pam", "age": 7} ] 

I can't just use fieldnames = list[1].keys() since it isn't necessarily the second element that will have extra keys.

A simple solution would be to find the dictionary with the greatest number of keys and use it for the fieldnames, but that won't work if you have an example like this:

[ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "height":4}, {"name": "Pam", "age": 7, "weight":90} ] 

where both the second and third dictionary have 3 keys but the end result should really be the list ["name", "age", "height", "weight"]

7 Answers 7

108
all_keys = set().union(*(d.keys() for d in mylist)) 

Edit: have to unpack the list. Now fixed.

Sign up to request clarification or add additional context in comments.

2 Comments

This solution works perfectly, but it seems to produce a list of keys that have a different order than the list of dictionaries they were extracted from. Any idea how to keep the indexing? Thank you!
@Momchill order is not guaranteed because he is using a set. I will post a snippet below for you that uses a list.
37

Your data:

>>> LoD [{'age': 10, 'name': 'Tom'}, {'age': 5, 'name': 'Mark', 'height': 4}, {'age': 7, 'name': 'Pam', 'weight': 90}] 

This set comprehension will do it:

>>> {k for d in LoD for k in d.keys()} {'age', 'name', 'weight', 'height'} 

It works this way. First, create a list of lists of the dict keys:

>>> [list(d.keys()) for d in LoD] [['age', 'name'], ['age', 'name', 'height'], ['age', 'name', 'weight']] 

Then create a flattened version of this list of lists:

>>> [i for s in [d.keys() for d in LoD] for i in s] ['age', 'name', 'age', 'name', 'height', 'age', 'name', 'weight'] 

And create a set to eliminate duplicates:

>>> set([i for s in [d.keys() for d in LoD] for i in s]) {'age', 'name', 'weight', 'height'} 

Which can be simplified to:

{k for d in LoD for k in d.keys()} 

If you wish to maintain the order that the keys are initially encountered in the list of dicts, you can use a dict instead of a set to produce the non duplicates. Since Python 3.6, dicts maintain insertion order while sets do not.

You could do:

>>> list({k:None for d in LoD for k in d.keys()}.keys()) ['age', 'name', 'height', 'weight'] 

Or,

>>> [k for k in {k:None for d in LoD for k in d}] ['age', 'name', 'height', 'weight'] 

1 Comment

You don't need d.keys() at all here: iterating a dictionary gives you its keys by default! {k for d in LoD for k in d}
5
from itertools import chain lis = [ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "height":4}, {"name": "Pam", "age": 7, "weight":90} ] # without qualification a dict iterates over its keys # and set takes any iterable in its constructor headers_as_set = set(chain.from_iterable(lis)) # you asked for a list headers = list( set(chain.from_iterable(lis)) ) 

Comments

4
>>> lis=[ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "height":4}, {"name": "Pam", "age": 7, "weight":90} ] >>> {z for y in (x.keys() for x in lis) for z in y} set(['age', 'name', 'weight', 'height']) 

Comments

3

Borrowing lis from @AshwiniChaudhary's answer, here is an explanation of how you could solve your problem.

>>> lis=[ {"name": "Tom", "age": 10}, {"name": "Mark", "age": 5, "height":4}, {"name": "Pam", "age": 7, "weight":90} ] 

Iterating directly over a dict returns its keys, so you don't have to call keys() to get them back, saving a function call and a list construction per element in your list.

>>> {k for d in lis for k in d} set(['age', 'name', 'weight', 'height']) 

or use itertools.chain:

>>> from itertools import chain >>> {k for k in chain(*lis)} set(['age', 'name', 'weight', 'height']) 

Comments

2

The following example will extract the keys:

set_ = set() for dict_ in dictionaries: set_.update(dict_.keys()) print set_ 

Comments

0

If order matters to you, read on...

Input your data:

>>> list_of_dicts = [{'age': 10, 'name': 'Tom'},{'age': 5, 'name': 'Mark', 'height': 4}, {'age': 7, 'name': 'Pam', 'weight': 90}] 

Define your function:

>>> def get_all_keys_in_order(list_of_dicts): ordered_keys = [] for dict_ in list_of_dicts: for key in dict_: if key not in ordered_keys: ordered_keys.append(key) return ordered_keys 

Run your function to get output:

>>> get_all_keys_in_order(list_of_dicts) ['age', 'name', 'height', 'weight'] 

1 Comment

@Momchill I think this solves your problem. Please note that this algorithm is slower than the set solution which could be a problem if you are working with big data. But for small data there is no problem.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.