Return to Answer

deleted 5 characters in body

edited Oct 29, 2015 at 18:43

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

and seems to be quite efficient

tf = pd.DataFrame([ enter code here {'id': i, 'nested': {'a': i, 'b': i*2} } for i in xrange(100000) ]) %timeit unpack(tf, 'nested') # gives 10 loops, best of 3: 95 ms per loop

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

The main advantages of the above solution are:

it is much more generic - does not depend on the structure ofkeys in your nested document
it is efficient - uses (presumably optimized) pandas methods where-ever possible and generators/iterators
handles keys that do not exist only in some nested documents and lets you specify the way they should be handled (fillna value or NaN)
can be converted to a one-liner for the sake of brevity
does not reinvent anything
uses naming consistent with other libraries (dato (graphlab create), SFrame.unpack method)

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

and seems to be quite efficient

tf = pd.DataFrame([ enter code here {'id': i, 'nested': {'a': i, 'b': i*2} } for i in xrange(100000) ]) %timeit unpack(tf, 'nested') # gives 10 loops, best of 3: 95 ms per loop

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

The main advantages of the above solution are:

it is much more generic - does not depend on the structure of your nested document
it is efficient - uses (presumably optimized) pandas methods where-ever possible and generators/iterators
handles keys that do not exist only in some nested documents and lets you specify the way they should be handled (fillna value or NaN)
can be converted to a one-liner for the sake of brevity
does not reinvent anything
uses naming consistent with other libraries (dato (graphlab create), SFrame.unpack method)

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

and seems to be quite efficient

tf = pd.DataFrame([ enter code here {'id': i, 'nested': {'a': i, 'b': i*2} } for i in xrange(100000) ]) %timeit unpack(tf, 'nested') # gives 10 loops, best of 3: 95 ms per loop

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

The main advantages of the above solution are:

it is much more generic - does not depend on the keys in your nested document
it is efficient - uses (presumably optimized) pandas methods where-ever possible and generators/iterators
handles keys that do not exist only in some nested documents and lets you specify the way they should be handled (fillna value or NaN)
can be converted to a one-liner for the sake of brevity
does not reinvent anything
uses naming consistent with other libraries (dato (graphlab create), SFrame.unpack method)

added 568 characters in body

Source Link

edited Oct 29, 2015 at 18:37

JohnnyM

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

and seems to be quite efficient

tf = pd.DataFrame([ enter code here {'id': i, 'nested': {'a': i, 'b': i*2} } for i in xrange(100000) ]) %timeit unpack(tf, 'nested') # gives 10 loops, best of 3: 95 ms per loop

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

The main advantages of the above solution are:

it is much more generic - does not depend on the structure of your nested document

it is efficient - uses (presumably optimized) pandas methods where-ever possible and generators/iterators

handles keys that do not exist only in some nested documents and lets you specify the way they should be handled (fillna value or NaN)

can be converted to a one-liner for the sake of brevity

does not reinvent anything

uses naming consistent with other libraries (dato (graphlab create), SFrame.unpack method)

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

and seems to be quite efficient

tf = pd.DataFrame([ enter code here {'id': i, 'nested': {'a': i, 'b': i*2} } for i in xrange(100000) ]) %timeit unpack(tf, 'nested') # gives 10 loops, best of 3: 95 ms per loop

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.

The main advantages of the above solution are:

it is much more generic - does not depend on the structure of your nested document

it is efficient - uses (presumably optimized) pandas methods where-ever possible and generators/iterators

handles keys that do not exist only in some nested documents and lets you specify the way they should be handled (fillna value or NaN)

can be converted to a one-liner for the sake of brevity

does not reinvent anything

uses naming consistent with other libraries (dato (graphlab create), SFrame.unpack method)

Source Link

answered Oct 29, 2015 at 16:58

JohnnyM

Efficient and elegant:

tf = pd.DataFrame([ {'id': 1, 'nested': {'a': 1, 'b': 2} }, {'id': 2, 'nested': {'a': 2, 'b': 4} }, {'id': 3, 'nested': {'a': 3, 'b': 6} }, {'id': 4, 'nested': {'a': 4}}, ]) def unpack(df, column, fillna=None): ret = None if fillna is None: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems()))], axis=1) del ret[column] else: ret = pd.concat([df, pd.DataFrame((d for idx, d in df[column].iteritems())).fillna(fillna)], axis=1) del ret[column] return ret unpack(tf, 'nested', 0)

will yield

 id a b 0 1 1 2 1 2 2 4 2 3 3 6 3 4 4 0

If you want to create a projection (select subset of the keys in the nested dict) you can use apply before unpack or a column projection on the dataframe created inside unpack.