2

I'm trying to flatten a JSON file that was originally converted from XML using xmltodict(). There are multiple fields that may have a list of dictionaries. I've tried using record_path with meta data to no avail, but I have not been able to get it to work when there are multiple fields that may have other nested fields. It's expected that some fields will be empty for any given record

I have tried searching for another topic and couldn't find my specific problem with multiple nested fields. Can anyone point me in the right direction?

Thanks for any help that can be provided!

Sample base Python (without the record path)

import pandas as pd import json with open('./example.json', encoding="UTF-8") as json_file: json_dict = json.load(json_file) df = pd.json_normalize(json_dict['WIDGET']) print(df) df.to_csv('./test.csv', index=False) 

Sample JSON

{ "WIDGET": [ { "ID": "6", "PROBLEM": "Electrical", "SEVERITY_LEVEL": "1", "TITLE": "Battery's Missing", "CATEGORY": "User Error", "LAST_SERVICE": "2020-01-04T17:39:37Z", "NOTICE_DATE": "2022-01-01T08:00:00Z", "FIXABLE": "1", "COMPONENTS": { "WHATNOTS": { "WHATNOT1": "Battery Compartment", "WHATNOT2": "Whirlygig" } }, "DIAGNOSIS": "Customer needs to put batteries in the battery compartment", "STATUS": "0", "CONTACT_TYPE": { "CALL": "1" } }, { "ID": "1004", "PROBLEM": "Electrical", "SEVERITY_LEVEL": "4", "TITLE": "Flames emit from unit", "CATEGORY": "Dangerous", "LAST_SERVICE": "2015-06-04T21:40:12Z", "NOTICE_DATE": "2022-01-01T08:00:00Z", "FIXABLE": "0", "DIAGNOSIS": "A demon seems to have possessed the unit and his expelling flames from it", "CONSEQUENCE": "Could burn things", "SOLUTION": "Call an exorcist", "KNOWN_PROBLEMS": { "PROBLEM": [ { "TYPE": "RECALL", "NAME": "Bad Servo", "DESCRIPTION": "Bad servo's shipped in initial product" }, { "TYPE": "FAILURE", "NAME": "Operating outside normal conditions", "DESCRIPTION": "Device failed when customer threw into wood chipper" } ] }, "STATUS": "1", "REPAIR_BULLETINS": { "BULLETIN": [ { "@id": "4", "#text": "Known target of the occult" }, { "@id": "5", "#text": "Not meant to be thrown into wood chippers" } ] }, "CONTACT_TYPE": { "CALL": "1" } } ] } 

Sample CSV

ID PROBLEM SEVERITY_LEVEL TITLE CATEGORY LAST_SERVICE NOTICE_DATE FIXABLE DIAGNOSIS STATUS COMPONENTS.WHATNOTS.WHATNOT1 COMPONENTS.WHATNOTS.WHATNOT2 CONTACT_TYPE.CALL CONSEQUENCE SOLUTION KNOWN_PROBLEMS.PROBLEM REPAIR_BULLETINS.BULLETIN
6 Electrical 1 Battery's Missing User Error 2020-01-04T17:39:37Z 2022-01-01T08:00:00Z 1 Customer needs to put batteries in the battery compartment 0 Battery Compartment Whirlygig 1
1004 Electrical 4 Flames emit from unit Dangerous 2015-06-04T21:40:12Z 2022-01-01T08:00:00Z 0 A demon seems to have possessed the unit and his expelling flames from it 1 1 Could burn things Call an exorcist [{'TYPE': 'RECALL', 'NAME': 'Bad Servo', 'DESCRIPTION': "Bad servo's shipped in initial product"}, {'TYPE': 'FAILURE', 'NAME': 'Operating outside normal conditions', 'DESCRIPTION': 'Device failed when customer threw into wood chipper'}] [{'@id': '4', '#text': 'Known target of the occult'}, {'@id': '5', '#text': 'Not meant to be thrown into wood chippers'}]

1 Answer 1

1

I have attempted to extract the data and turned it into nested dictionary (instead of nested with list), so that pd.json_normalize() can work

for row in range(len(json_dict['WIDGET'])): try: lis = json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM'] del json_dict['WIDGET'][row]['KNOWN_PROBLEMS']['PROBLEM'] for i, item in enumerate(lis): json_dict['WIDGET'][row]['KNOWN_PROBLEMS'][str(i)] = item lis = json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN'] del json_dict['WIDGET'][row]['REPAIR_BULLETINS']['BULLETIN'] for i, item in enumerate(lis): json_dict['WIDGET'][row]['REPAIR_BULLETINS'][str(i)] = item except KeyError: continue df = pd.json_normalize(json_dict['WIDGET']).T print(df) 

If you have to manually add the varying keys from the larger dataset, here's a way to extract them automatically by identifying them as type list (and provided they are nested by 2 levels only)

linkage = [] for item in json_dict['WIDGET']: for k1 in item.keys(): #get keys from first level if isinstance(item[k1], str): continue #print(item[k1]) for k2 in item[k1].keys(): #get keys from second level if isinstance(item[k1][k2], str): continue #print(item[k1][k2]) if isinstance(item[k1][k2], list): linkage.append((k1, k2)) print(linkage) # [('KNOWN_PROBLEMS', 'PROBLEM'), ('REPAIR_BULLETINS', 'BULLETIN')] for row in range(len(json_dict['WIDGET'])): for link in linkage: try: lis = json_dict['WIDGET'][row][link[0]][link[1]] del json_dict['WIDGET'][row][link[0]][link[1]] #delete original dict value (which is a list) for i, item in enumerate(lis): json_dict['WIDGET'][row][link[0]][str(i)] = item #replace list with dict value (which is a dict) except KeyError: continue df = pd.json_normalize(json_dict['WIDGET']).T print(df) 

Output:

 0 1 ID 6 1004 PROBLEM Electrical Electrical SEVERITY_LEVEL 1 4 TITLE Battery's Missing Flames emit from unit CATEGORY User Error Dangerous LAST_SERVICE 2020-01-04T17:39:37Z 2015-06-04T21:40:12Z NOTICE_DATE 2022-01-01T08:00:00Z 2022-01-01T08:00:00Z FIXABLE 1 0 DIAGNOSIS Customer needs to put batt... A demon seems to have poss... STATUS 0 1 COMPONENTS.WHATNOTS.WHATNOT1 Battery Compartment NaN COMPONENTS.WHATNOTS.WHATNOT2 Whirlygig NaN CONTACT_TYPE.CALL 1 1 CONSEQUENCE NaN Could burn things SOLUTION NaN Call an exorcist KNOWN_PROBLEMS.0.TYPE NaN RECALL KNOWN_PROBLEMS.0.NAME NaN Bad Servo KNOWN_PROBLEMS.0.DESCRIPTION NaN Bad servo's shipped in ini... KNOWN_PROBLEMS.1.TYPE NaN FAILURE KNOWN_PROBLEMS.1.NAME NaN Operating outside normal c... KNOWN_PROBLEMS.1.DESCRIPTION NaN Device failed when custome... REPAIR_BULLETINS.0.@id NaN 4 REPAIR_BULLETINS.0.#text NaN Known target of the occult REPAIR_BULLETINS.1.@id NaN 5 REPAIR_BULLETINS.1.#text NaN Not meant to be thrown int... 
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for this! I've got a much larger file to apply this to with varying levels of data, but I think I can adapt this logic for that. Appreciate you taking the time to help!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.