GitHub - benchling/sqlalchemy_batch_inserts: Batch inserts for psycopg2 on SQLAlchemy

Benchling uses SQLAlchemy and psycopg2 to talk to PostgreSQL. To save on round-trip latency, we batch our inserts using this code.

Typically, creating and flushing N models in SQLAlchemy does N roundtrips to the database if the model has an autoincrementing primary key. This module improves creating N models to only require 2 roundtrips, without requiring any other changes to your code.

Here's our accompanying blog post that describes how we investigated and improved insert performance: sqlalchemy_batch_inserts: a module for when you’re inserting thousands of rows and it’s slow

Is this for me?

You may find use for this module if:

You are using SQLAlchemy
You are using Postgres
You sometimes need to create several models at once and care about performance

Usage

from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from sqlalchemy_batch_inserts import enable_batch_inserting engine = create_engine("postgresql+psycopg2://postgres@localhost", executemany_mode="values") # SQLAlchemy < 1.3.7 needs use_batch_mode=True instead Session = sessionmaker(bind=engine) session = Session() enable_batch_inserting(session)

If you use Flask-SQLAlchemy,

from flask_sqlalchemy import SignallingSession from sqlalchemy_batch_inserts import enable_batch_inserting # Make sure that you've specified executemany_mode or use_batch_mode when creating your engine! Otherwise # this library will not have any effect. enable_batch_inserting(SignallingSession)

Demo

docker build -t sqla_batch . docker run --rm -v $PWD:/src --name sqla_batch sqla_batch # Wait for it to finish spinning up # Switch to another shell docker exec -it sqla_batch src/demo.py no 100 docker exec -it sqla_batch src/demo.py yes 100

To simulate 100 * 3 inserts with 20 ms latency, first change the connection string in demo.py from postgresql+psycopg2://postgres@localhost to postgresql+psycopg2://postgres@db. Then,

docker network create sqla_batch docker run --rm --network sqla_batch --network-alias db --name db sqla_batch # Switch to another shell docker run -it -v /var/run/docker.sock:/var/run/docker.sock --network sqla_batch gaiaadm/pumba netem --duration 15m --tc-image gaiadocker/iproute2 delay --time 20 --jitter 0 db # Switch to another shell # This should take 100 * 3 * 20 ms = 6 seconds docker run -it --rm -v $PWD:/src --network sqla_batch sqla_batch src/demo.py no 100 # This should take 4 * 20 ms = 0.08 seconds docker run -it --rm -v $PWD:/src --network sqla_batch sqla_batch src/demo.py yes 100

Acknowledgements

This is all possible thanks to @dvarrazzo's psycopg2 execute_batch and @zzzeek's SQLAlchemy support for the same and helpful advice on the mailing list.

Maintainer notes

After bumping the version in setup.py and __version__ in __init__.py,

$ ./setup.py sdist $ twine upload --repository-url https://test.pypi.org/legacy/ dist/* # Check https://test.pypi.org/project/sqlalchemy-batch-inserts/ $ twine upload dist/*

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
sqlalchemy_batch_inserts		sqlalchemy_batch_inserts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Is this for me?

Usage

Demo

Acknowledgements

Maintainer notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

benchling/sqlalchemy_batch_inserts

Folders and files

Latest commit

History

Repository files navigation

Is this for me?

Usage

Demo

Acknowledgements

Maintainer notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages