1

I have a pandas dataframe with multiple headers. I am wondering how to convert this into a list of nested directories. Each row in the Pandas dataframe will be a nested dictionary in the list.

Here is an example

#Creaet an example multiheader dataframe col =['id','x, single room','x, double room','y, single room','y, double room' ] df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col) a = df.columns.str.split(', ', expand=True).values #swap values in NaN and replace NAN to '' df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a]) df 

result

 x y id single room double room single room double room 0 1 2 3 4 5 1 3 4 7 5 3 

This is the dataframe I want to convert to a list of nested dictionaries. So this is the desired result

[{'id': 1, 'x': {'double room': 3, 'single room': 2}, 'y': {'double room': 5, 'single room': 4}}, {'id': 3, 'x': {'double room': 7, 'single room': 4}, 'y': {'double room': 3, 'single room': 5}}] 

In the code below, I directly create this list.

firstDict = { 'id':1, 'x':{'single room':2, 'double room':3}, 'y':{'single room':4, 'double room':5} } secondDict = { 'id':3, 'x':{'single room':4, 'double room':7}, 'y':{'single room':5, 'double room':3} } dictList = [] dictList.append( firstDict ) dictList.append( secondDict ) dictList [{'id': 1, 'x': {'double room': 3, 'single room': 2}, 'y': {'double room': 5, 'single room': 4}}, {'id': 3, 'x': {'double room': 7, 'single room': 4}, 'y': {'double room': 3, 'single room': 5}}] 

So in summary, how to I convert the dataframe df to what dictList is.

Edit:

This is a minimal example, the solution I am looking for should generalize to longer numbers of headers.

5 Answers 5

6

I don't think there is a straight forward way of doing this, that being said, you could use stack + to_dict and some post-processing afterwards:

# prepare the DataFrame df = df.set_index(('', 'id')).stack(level=0) df.index.names = ['id', None] # convert to a dicts of dicts d = {} for (idi, key), values in df.to_dict('index').items(): d.setdefault(idi, {}).update({key: values}) # convert d to list of dicts result = [{'id': k, **values} for k, values in d.items()] 

Output

[{'id': 1, 'x': {'double room': 3, 'single room': 2}, 'y': {'double room': 5, 'single room': 4}}, {'id': 3, 'x': {'double room': 7, 'single room': 4}, 'y': {'double room': 3, 'single room': 5}}] 
Sign up to request clarification or add additional context in comments.

2 Comments

I am wondering how to apply df = df.set_index(('', 'id')).stack(level=0) in cases where the column to set index also has multiple headers. In my case, the top level column name is 'ID', and 2nd level column name is 'New'. I tried df2 = l[51].set_index(('ID', 'New')).stack(level=0) but got KeyError: ('ID', 'New')
That must be because there is no column named liked that, try to take a look at your column name
2

Unsure of how long your number of headers can be, currently it is in a state of easily being coded by hand, as given below -

dct = [] for x in df.values: nd = { "id": x[0], "x": { "single room": x[1], "double room": x[2] }, "y": { "single room": x[3], "double room": x[4] } } dct.append(nd) 

Do let me know if there are a high number of headers and the code needs to handle them without explicit typing in.

Comments

1

Something like this?

import pandas as pd col =['id','x, single room','x, double room','y, single room','y, double room' ] df = pd.DataFrame([[1,2,3,4,5], [3,4,7,5,3]], columns=col) a = df.columns.str.split(', ', expand=True).values #swap values in NaN and replace NAN to '' df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a]) print(df) dict_list = [] for index, row in df.iterrows(): d = {} # _dict [row["id"]] print(type(row), row)#, row.select(1, axis = 0) ) d["id"] = row[0] d["x"] = {'single room':row[1], 'double room':row[1]} d["y"] = {'single room':row[3], 'double room':row[4]} dict_list.append(d) print(dict_list) 

OUTPUT:

[{'id': 1, 'x': {'single room': 2, 'double room': 2}, 'y': {'single room': 4, 'double room': 5} }, {'id': 3, 'x': {'single room': 4, 'double room': 4}, 'y': {'single room': 5, 'double room': 3} } ] 

Comments

1

you could use either

l = [] d = None for i, row in df.iterrows(): for (i1,i2),v in row.iteritems(): if i2 == 'id': d = {i2:v} l.append(d) continue try: d[i1][i2]=v except KeyError: d[i1] = {i2:v} 

or if you're ok with a slight modification of your expected result:

from collections import defaultdict l =[] for i, row in df.iterrows(): d = defaultdict(dict) for (i1,i2),v in row.iteritems(): if i2 == 'id': d[i2][v]=v else: d[i1][i2]=v l.append(dict(d)) 

which outputs:

[{'id': {1: 1}, 'x': {'single room': 2, 'double room': 3}, 'y': {'single room': 4, 'double room': 5}}, {'id': {3: 3}, 'x': {'single room': 4, 'double room': 7}, 'y': {'single room': 5, 'double room': 3}}] 

Comments

1

I like accepted solution, but here my two alternatives wihout stacking.

This solution is straight-forward but with more columns a lot of repetition and error-prone:

lst = [{'id': d[('', 'id')], 'x': {'single room': d[('x', 'single room')], 'double room': d[('x', 'double room')]}, 'y': {'single room': d[('y', 'single room')], 'double room': d[('y', 'double room')]},} for d in df.to_dict('records') ] 

Let's try make it more scalable, from Arbitrarily nested dictionary from tuples you can get nest function:

def nest(d: dict) -> dict: result = {} for key, value in d.items(): target = result for k in key[:-1]: target = target.setdefault(k, {}) target[key[-1]] = value return result 

But for ('', id) we need slighlty different behavior:

def nest_m(d: dict) -> dict: result = {} for key, value in d.items(): if key == ('', 'id'): result['id'] = value else: target = result for k in key[:-1]: target = target.setdefault(k, {}) target[key[-1]] = value return result 

Final line:

lst = [nest_m(d) for d in df.to_dict('records')] 

Output:

[{'id': 1, 'x': {'single room': 2, 'double room': 3}, 'y': {'single room': 4, 'double room': 5}}, {'id': 3, 'x': {'single room': 4, 'double room': 7}, 'y': {'single room': 5, 'double room': 3}}] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.