6

This is a sample json file I'm working with with 2 records:

[{"Time":"2016-01-10", "ID" :13567, "Content":{ "Event":"UPDATE", "Id":{"EventID":"ABCDEFG"}, "Story":[{ "@ContentCat":"News", "Body":"Related Meeting Memo: Engagement with target firm for potential M&A. Please be on call this weekend for news updates.", "BodyTextType":"PLAIN_TEXT", "DerivedId":{"Entity":[{"Id":"Amy","Score":70}, {"Id":"Jon","Score":70}]}, "DerivedTopics":{"Topics":[ {"Id":"Meeting","Score":70}, {"Id":"Performance","Score":70}, {"Id":"Engagement","Score":100}, {"Id":"Salary","Score":70}, {"Id":"Career","Score":100}] }, "HotLevel":0, "LanguageString":"ENGLISH", "Metadata":{"ClassNum":50, "Headline":"Attn: Weekend", "WireId":2035, "WireName":"IIS"}, "Version":"Original"} ]}, "yyyymmdd":"20160110", "month":201601}, {"Time":"2016-01-12", "ID":13568, "Content":{ "Event":"DEAL", "Id":{"EventID":"ABCDEFG2"}, "Story":[{ "@ContentCat":"Details", "Body":"Test email contents", "BodyTextType":"PLAIN_TEXT", "DerivedId":{"Entity":[{"Id":"Bob","Score":100}, {"Id":"Jon","Score":70}, {"Id":"Jack","Score":60}]}, "DerivedTopics":{"Topics":[ {"Id":"Meeting","Score":70}, {"Id":"Engagement","Score":100}, {"Id":"Salary","Score":70}, {"Id":"Career","Score":100}] }, "HotLevel":0, "LanguageString":"ENGLISH", "Metadata":{"ClassNum":70, "Headline":"Attn: Weekend", "WireId":2037, "WireName":"IIS"}, "Version":"Original"} ]}, "yyyymmdd":"20160112", "month":201602}] 

I'm trying to get to a dataframe at the level of the entity IDs (extracting Amy and Jon from record 1 and Bob, Jon, Jack from record 2).

However I'm already getting an error early on. Here's my code so far, assuming the sample json is saved as sample.json:

data = json.load(open('sample.json')) test = json_normalize(data, record_path=['Content', 'Story']) 

Results in this error:

TypeError: string indices must be integers 

I suspect it's because Content.Story is actually a list containing a dictionary, instead of dictionary itself. But it's not clear to me how to actually get past this?

EDIT: To clarify, I'm ultimately trying to get to the level of the entity IDs (Content > Story > DerivedID > Entity > Id). Was showing the Content.Story code example just to illustrate where I'm at right now in figuring this out.

2
  • 1
    Shouldn't it be [['Content', 'Story']] (As you only have one record, Content.Story) Commented Jul 8, 2018 at 22:32
  • You will get more and better answers if you create a Minimal, Complete, and Verifiable example. Especially make sure that the input and expected test data are complete (not pseudo-data), and can be easily cut and and paste into an editor to allow testing proposed solutions. Commented Jul 8, 2018 at 22:55

1 Answer 1

10

json_normalize(data, record_path=[['Content', 'Story']])

That should work.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. How do I get to the level of the entity IDs though (Content > Story > DerivedID > Entity > Id)?
That should be a new question :), Please make one and comment the link here. If this helped, please upvote and mark as accepted answer :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.