How to serialise and deserialise complex POCO data structures in Python to/from JSON

Question

We have been researching this for hours now, with no luck, there are many ways to serialise and deserialise objects in Python, but we need a simple and standard one that respects typings, for example:

from typings import List, NamedTuple class Address(object): city:str postcode:str class Person(NamedTuple): name:str addresses:List[Address]

My ask is extremely simple, I am looking for a standard way to convert to and from JSON, without the need to write the serialisation/deserlialisation code for every class, for example:

json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}'

I need a way to:

p= magic(json, Person) # or something similar print(type(p)) # should print Person for a in p.addresses: print(type(a)) # prints Address print(a.city) # should print London then Paris json2 = unmagic(p) print(json2 == json) # prints true (probably there will be difference in spacing, but just to clarify the idea)

I have worked in programming for 15 years, and have been using Python for a year, and still not sure what is the best way of very simply serialise/deserialise a structure of POCO objects even after extensive research, I feel dumb.

Edit

Options explored so far have one or more of the following limitations:

Depend on the order of elements within the JSON / class definition instead of names of the attributes (the previous example would fail because city and postcode are mixed up).
Fail if there are extra details in the JSON (the previous example would fail because there is an extra_attribute).
Return dictionary instead of a typed object, or SimpleNamespace, and not an object of the intended type.
Require writing serialisation/deserialization code for each and every different class, which is extremely error-prone.

Thanks mate, I will have a look, but I am hoping to find a native way, I feel that the ask is really simple :) — Saw
– Saw, Commented Feb 27, 2021 at 15:11
Natively you are not going to find anything without rolling your own. Python's json package is very basic, just encode/decode. — im_baby
– im_baby, Commented Feb 27, 2021 at 15:18
Note that json == json2 won't be true because you put extra attributes in the input JSON. — Martijn Pieters
– Martijn Pieters, Commented Mar 12, 2021 at 13:12

Martijn Pieters · Accepted Answer · 2021-03-12 19:04:38Z

I generally use the Marshmallow project to handle JSON serialisation, deserialisation, and validation. When combined with marshmallow-dataclass or, when using SQLAlchemy database models, marshmallow-sqlalchemy, you can produce Marshmallow schemas straight from existing object definitions. You work with instances of the model themselves, so dataclass-defined class instances or SQLAlchemy ORM model instances.

Marshmallow schemas also let you define what happens with extra values in the JSON document; you can ignore these, or throw an exception for them, and vary this per model (models can be nested as needed). You can reuse schemas to subsets of the fields too.

Your small sample model, using marshmallow-dataclass, could be defined as:

import marshmallow from marshmallow_dataclass import dataclass from typing import List class BaseSchema(marshmallow.Schema): class Meta: unknown = marshmallow.EXCLUDE @dataclass(base_schema=BaseSchema) class Address: city: str postcode: str @dataclass(base_schema=BaseSchema) class Person: name: str addresses: List[Address]

and apart from pip install marshmallow-dataclass before attempting to run the above, that's it. This example uses an explicit base schema to set the unknown configuration to EXCLUDE, which means: ignore extra attributes in the JSON when loading.

To either deserialize from JSON data, or to serialise to JSON, create an instance of the schema; each dataclass class has a Schema attribute referencing the corresponding (generated) Marshmallow schema object:

>>> schema = Person.Schema() >>> json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}' >>> p = schema.loads(json) >>> p Person(name='John', addresses=[Address(city='London', postcode='EC2 2FA'), Address(city='Paris', postcode='545887')]) >>> print(type(p)) # should print Person <class '__main__.Person'> >>> for a in p.addresses: ... print(type(a)) # prints Address ... print(a.city) # should print London then Paris ... <class '__main__.Address'> London <class '__main__.Address'> Paris >>> schema.dumps(p) '{"name": "John", "addresses": [{"postcode": "EC2 2FA", "city": "London"}, {"postcode": "545887", "city": "Paris"}]}'

The Schema.loads() and Schema.dumps() methods accept and produce JSON strings. You can also work with plain Python dictionaries and lists (the types that would be serialisable to JSON using the standard library json module), via Schema.load() and Schema.dump().

For more complex setups you may need to configure the exact validation rules for fields, or exclude some fields from serialisation. You do this with the standard dataclasses.field() function, passing in Marshmallow field options via the metadata argument. marshmallow-dataclass can work out what exact Marshmallow field type to use, but you can always override this. And you can use the NewType() class to define reusable definitions for this; SomeType = NewType("SomeType", python_type, field=MarshmallowField, **field_args) lets you mark dataclass fields as field_name: SomeType in your project.

Marshmallow is, at least for me, the Swiss Army Knife project of serialisation and deserialisation, and there are lots of resources that integrate with Marshmallow. E.g. I'm looking at building several RESTFul APIs for a customer at the moment, and I'll definitely be using Flask-Smorest to define the API endpoints and generate OpenAPI documentation at the same time. And all I have to do is create the SQLAlchemy models for this, really.

Here is an example Flask RESTful API based on your Person & Address schema, but as SQLALchemy models, served as RESTful API:

# pip install Flask flask-marshmallow flask-smorest flask-sqlalchemy marshmallow-sqlalchemy import marshmallow from flask import Flask from flask.views import MethodView from flask_marshmallow import Marshmallow from flask_smorest import Api, Blueprint, abort from flask_sqlalchemy import SQLAlchemy app = Flask(__name__) app.config['API_TITLE'] = 'ContactBook' app.config['API_VERSION'] = 'v1' app.config['OPENAPI_VERSION'] = '3.0.3' app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///:memory:' api = Api(app) db = SQLAlchemy(app) ma = Marshmallow(app) class Address(db.Model): id = db.Column(db.Integer, primary_key=True) city = db.Column(db.String) postcode = db.Column(db.String) person_id = db.Column(db.Integer, db.ForeignKey('person.id'), nullable=False) class Person(db.Model): id = db.Column(db.Integer, primary_key=True) name = db.Column(db.String) addresses = db.relationship('Address', backref='person', lazy=True) # create tables in the (in-memory, temporary) database db.create_all() class BaseSQLAlchemyAutoSchema(ma.SQLAlchemyAutoSchema): def update(self, instance, **data): for fname in self.fields: if fname not in data: continue setattr(instance, fname, data.get(fname)) class AddressSchema(BaseSQLAlchemyAutoSchema): class Meta: table = Address.__table__ class PersonSchema(BaseSQLAlchemyAutoSchema): class Meta: table = Person.__table__ addresses = ma.List(ma.Nested(AddressSchema(unknown=marshmallow.EXCLUDE))) class PersonQueryArgsSchema(ma.Schema): name = ma.String() city = ma.String() blp = Blueprint( "people", "people", url_prefix="/people", description="Operations on people" ) @blp.route("/") class People(MethodView): @blp.arguments(PersonQueryArgsSchema, location="query") @blp.response(200, PersonSchema(many=True)) def get(self, args): """List people""" query = Person.query if args.get("name"): query = query.filter(Person.name == args["name"]) if args.get("city"): query = query.filter(Person.addresses.any(Address.city == args["city"])) return query @blp.arguments(PersonSchema(unknown=marshmallow.EXCLUDE)) @blp.response(201, PersonSchema) def post(self, new_person): """Add a new person""" addresses = new_person.pop("addresses", ()) person = Person(**new_person) for address in addresses: person.addresses.append(Address(**address)) db.session.add(person) db.session.commit() return person @blp.route("/<person_id>") class PersonById(MethodView): @blp.response(200, PersonSchema) def get(self, person_id): """Get person by ID""" return Person.query.get_or_404(person_id) @blp.arguments(PersonSchema(unknown=marshmallow.EXCLUDE, exclude=('addresses',))) @blp.response(200, PersonSchema) def put(self, updated_person_data, person_id): """Update existing person""" person = Person.query.get_or_404(person_id) PersonSchema().update(person, **updated_person_data) db.session.commit() return person @blp.response(204) def delete(self, person_id): """Delete person""" db.session.delete(Person.query.get_or_404(person_id)) api.register_blueprint(blp)

Voila, full-featured REST API that lets us list, updated, created and deleted Person entries.

In the above example, if say I have column called postcode and json response required postalcode. How do I acheive this?
@avi: marshmallow fields can be configured to use a different name in the JSON reflection with the data_key argument.

Jorge Verdeguer Gómez · Accepted Answer · 2021-03-09 08:59:23Z

You can use dataclasses and dacite library for solving this problem. Here's my example:

 from dataclasses import dataclass, asdict from typing import List from dacite import from_dict @dataclass class Address(object): city: str postcode: str @dataclass class Person(): name: str addresses: List[Address]

So if you want to serialize the class person you can do:

address1 = Address("London", "EC2 2FA") address2 = Address("Paris", "545887") person = Person(name='John', addresses=[address1, address2]) json = asdict(person) print(json)

Which will print your person information as:

{'name': 'John', 'addresses': [{'city': 'London', 'postcode': 'EC2 2FA'}, {'city': 'Paris', 'postcode': '545887'}]}

Although a native way was preferred, there's no easy way of accomplishing all the requirements in a simple and native way. Assuming that you don't want to drop any requirement, the simplest solution I found is using dacite library. It has only one method, from_dict(class, data), which takes care of nested dataclass creation and ignoring extra arguments in the json, among many other things .

person2 = from_dict(Person, json)

This complies with all your requirements:

json = '{ "name": "John", "addresses": [{ "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" }]}' p = from_dict(Person, json) print(type(p)) # should print Person for a in p.addresses: print(type(a)) # prints Address print(a.city) # should print London then Paris json2 = asdict(p) print(json) print(json2)

Results in:

<class '__main__.Person'> <class '__main__.Address'> London <class '__main__.Address'> Paris {'name': 'John', 'addresses': [ {'postcode': 'EC2 2FA', 'city': 'London'}, {'city': 'Paris', 'postcode': '545887', 'extra_attribute': ''} ]} {'name': 'John', 'addresses': [ {'city': 'London', 'postcode': 'EC2 2FA'}, {'city': 'Paris', 'postcode': '545887'} ]}

Warning: json will not be equal to json2 in this case, since asdict(p) will generate the dict with the elements in declaration order. Nonetheless, objects created using this json2 will have equal values to the objects created with json.

Dear friend, this is a solution we explored, but it depends on the order of params and will fail if you add extra params to the JSON or the class itself everything will fall apart.
I highlighted this in my question, sorry if it wasn't clear! we also explored Namedtuples with no luck.
I have two questions about this problem. What's the expected outcome if JSON has extra params and what's the expected outcome if the class has more fields than JSON params? Do you want something scalable where old versions of JSON are still usable by a newer class?
I would expect extra data in JSON to be ignored, and binding of the data to be by name and not by order of fields, exactly, that is the purpose, making sure that different versions of JSON and class still work with no issues, which is why JSON is useful IMHO.
Although a native way is possible, I have found that dacite library is the simplest way of solving this problems. If you want to really stick to native and don't use any library, you can start from stackoverflow.com/a/54769644/15321776 for nested class creation and use decorators for ignoring extra arguments.

Parham sagharichi ha · Accepted Answer · 2021-03-11 09:25:09Z

First:

pip install dacite

Second: create dto.py

import logging from typing import Optional, List, cast from dataclasses import dataclass from dacite import from_dict logging.basicConfig( filename='response.log', level=logging.INFO, format='%(asctime)s %(levelname)-8s [%(filename)s:%(lineno)d:%(process)s] %(message)s', datefmt='%Y-%m-%d %H:%M:%S', ) SENTINEL = cast(None, object()) @dataclass class Address: city: Optional[str] = SENTINEL postcode: Optional[str] = SENTINEL def asdict(self): return {k: v for k, v in self.__dict__.items() if v is not SENTINEL} @dataclass class Person: name: Optional[str] = SENTINEL addresses: Optional[List[Address]] = SENTINEL def asdict(self): return {k: v for k, v in self.__dict__.items() if v is not SENTINEL} if __name__ == '__main__': SAMPLE = { "name": "John", "addresses": [ { "postcode": "EC2 2FA", "city": "London" }, { "city": "Paris", "postcode": "545887", "extra_attribute": "" } ] } try: targetClass = ( Address ) INFORMATION = from_dict( data_class=Person, data=SAMPLE ) # TODO: Should be ommited (Just for your questions). logging.info( " -- type(p): " + str(type(INFORMATION)) ) # TODO: Should be ommited (Just for your questions). for a in INFORMATION.addresses: logging.info( " -- type(a): " + str(type(a)) ) logging.info( " -- a.city: " + str(a.city) ) INFORMATION = INFORMATION.asdict() for key, value in INFORMATION.items(): if isinstance(value, targetClass): INFORMATION.update({key: value.asdict()}) if isinstance(value, list) and value and isinstance(value[0], targetClass): INFORMATION.update({key: [v.asdict() for v in value]}) except Exception as e: logging.error( 'Error: {}'.format(e) ) finally: # TODO: Should be ommited (Just for your questions). logging.info( " -- json: " + str(SAMPLE) ) # TODO: Should be ommited (Just for your questions). logging.info( " -- json2: " + str(INFORMATION) ) # TODO: Should be ommited (Just for your questions). logging.info( " -- json2 == json: " + str(INFORMATION == SAMPLE) )

Third: see response.log

2021-03-11 12:49:08 INFO [dto.py:66:42426] -- type(INFORMATION): <class '__main__.Person'> 2021-03-11 12:49:08 INFO [dto.py:72:42426] -- type(a): <class '__main__.Address'> 2021-03-11 12:49:08 INFO [dto.py:76:42426] -- a.city: London 2021-03-11 12:49:08 INFO [dto.py:72:42426] -- type(a): <class '__main__.Address'> 2021-03-11 12:49:08 INFO [dto.py:76:42426] -- a.city: Paris 2021-03-11 12:49:08 INFO [dto.py:92:42426] -- json: {'name': 'John', 'addresses': [{'postcode': 'EC2 2FA', 'city': 'London'}, {'city': 'Paris', 'postcode': '545887', 'extra_attribute': ''}]} 2021-03-11 12:49:08 INFO [dto.py:96:42426] -- json2: {'name': 'John', 'addresses': [{'city': 'London', 'postcode': 'EC2 2FA'}, {'city': 'Paris', 'postcode': '545887'}]} 2021-03-11 12:49:08 INFO [dto.py:100:42426] -- json2 == json: False

Wizard.Ritvik · Accepted Answer · 2022-08-22 15:52:58Z

You can use the builtin dataclasses module, along with a preferred (de)serialization library such as the dataclass-wizard, in order to achieve the desired results.

First, start off by defining the class model or schema, using the @dataclass decorator:

from __future__ import annotations # can be removed in PY 3.9+ from dataclasses import dataclass @dataclass class Address: city: str postcode: str @dataclass class Person: name: str addresses: list[Address]

Then, install any desired (third-party) libraries:

pip install dacite dataclass-wizard

Adding a quick test, in Python code:

from dataclass_wizard import fromdict, asdict json_dict = { "name": "John", "addresses": [{"postcode": "EC2 2FA", "city": "London"}, {"city": "Paris", "postcode": "545887", "extra_attribute": ""}] } p = fromdict(Person, json_dict) # or something similar print(type(p)) # should print Person for a in p.addresses: print(type(a)) # prints Address print(a.city) # should print London then Paris json_dict2 = asdict(p) # removes extra data, since that throws off comparison json_dict['addresses'][-1].pop('extra_attribute') print(json_dict2 == json_dict) # prints true

Output:

<class '__main__.Person'> <class '__main__.Address'> London <class '__main__.Address'> Paris True

Measuring Performance

Here's a quick test using the timeit module to measure (de)serialization times against the dacite and dataclasses library. A fun fact, serialization is slightly faster than the builtin asdict helper function :)

from timeit import timeit import dacite import dataclasses import dataclass_wizard json_dict = {"name": "John", "addresses": [{"postcode": "EC2 2FA", "city": "London"}, {"city": "Paris", "postcode": "545887", "extra_attribute": ""}]} n = 10_000 print(f'dataclass_wizard.fromdict: {timeit("dataclass_wizard.fromdict(Person, json_dict)", globals=globals(), number=n):.3f}') print(f'dacite.from_dict: {timeit("dacite.from_dict(Person, json_dict)", globals=globals(), number=n):.3f}') p1 = dataclass_wizard.fromdict(Person, json_dict) # or something similar p2 = dacite.from_dict(Person, json_dict) # or something similar assert p1 == p2 print(f'dataclass_wizard.asdict: {timeit("dataclass_wizard.asdict(p2)", globals=globals(), number=n):.3f}') print(f'dataclasses.asdict: {timeit("dataclasses.asdict(p2)", globals=globals(), number=n):.3f}')

Results on my M1 Mac:

dataclass_wizard.fromdict: 0.025 dacite.from_dict: 0.972 dataclass_wizard.asdict: 0.028 dataclasses.asdict: 0.054

Collectives™ on Stack Overflow

How to serialise and deserialise complex POCO data structures in Python to/from JSON

4 Answers 4

2 Comments

5 Comments

Comments

Measuring Performance

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

5 Comments

Comments

Measuring Performance

Comments

Linked

Related