This is a sample json file I'm working with with 2 records:
[{"Time":"2016-01-10", "ID" :13567, "Content":{ "Event":"UPDATE", "Id":{"EventID":"ABCDEFG"}, "Story":[{ "@ContentCat":"News", "Body":"Related Meeting Memo: Engagement with target firm for potential M&A. Please be on call this weekend for news updates.", "BodyTextType":"PLAIN_TEXT", "DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]}, "DerivedTopics":{"Topics":[ {"Id":"Meeting","Score":70}, {"Id":"Performance","Score":70}, {"Id":"Engagement","Score":100}, {"Id":"Salary","Score":70}, {"Id":"Career","Score":100}] }, "HotLevel":0, "LanguageString":"ENGLISH", "Metadata":{"ClassNum":50, "Headline":"Attn: Weekend", "WireId":2035, "WireName":"IIS"}, "Version":"Original"} ]}, "yyyymmdd":"20160110", "month":201601}, {"Time":"2016-01-12", "ID":13568, "Content":{ "Event":"DEAL", "Id":{"EventID":"ABCDEFG2"}, "Story":[{ "@ContentCat":"Details", "Body":"Test email contents", "BodyTextType":"PLAIN_TEXT", "DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]}, "DerivedTopics":{"Topics":[ {"Id":"Meeting","Score":70}, {"Id":"Engagement","Score":100}, {"Id":"Salary","Score":70}, {"Id":"Career","Score":100}] }, "HotLevel":0, "LanguageString":"ENGLISH", "Metadata":{"ClassNum":70, "Headline":"Attn: Weekend", "WireId":2037, "WireName":"IIS"}, "Version":"Original"} ]}, "yyyymmdd":"20160112", "month":201602}] I'm trying to get to a dataframe at the level of the entity IDs (extracting Amy and Jon from record 1 and Bob, Jon, Jack from record 2).
However I'm already getting an error early on. Here's my code so far, assuming the sample json is saved as sample.json:
data = json.load(open('sample.json')) test = json_normalize(data, record_path=['Content', 'Story']) Results in this error:
TypeError: string indices must be integers I suspect it's because Content.Story is actually a list containing a dictionary, instead of dictionary itself. But it's not clear to me how to actually get past this?
EDIT: To clarify, I'm ultimately trying to get to the level of the entity IDs (Content > Story > DerivedID > Entity > Id). Was showing the Content.Story code example just to illustrate where I'm at right now in figuring this out.
[['Content', 'Story']](As you only have one record,Content.Story)