Benchling uses SQLAlchemy and psycopg2 to talk to PostgreSQL. To save on round-trip latency, we batch our inserts using this code.
Typically, creating and flushing N models in SQLAlchemy does N roundtrips to the database if the model has an autoincrementing primary key. This module improves creating N models to only require 2 roundtrips, without requiring any other changes to your code.
Here's our accompanying blog post that describes how we investigated and improved insert performance: sqlalchemy_batch_inserts: a module for when you’re inserting thousands of rows and it’s slow
You may find use for this module if:
- You are using SQLAlchemy
- You are using Postgres
- You sometimes need to create several models at once and care about performance
from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker from sqlalchemy_batch_inserts import enable_batch_inserting engine = create_engine("postgresql+psycopg2://postgres@localhost", executemany_mode="values") # SQLAlchemy < 1.3.7 needs use_batch_mode=True instead Session = sessionmaker(bind=engine) session = Session() enable_batch_inserting(session)If you use Flask-SQLAlchemy,
from flask_sqlalchemy import SignallingSession from sqlalchemy_batch_inserts import enable_batch_inserting # Make sure that you've specified executemany_mode or use_batch_mode when creating your engine! Otherwise # this library will not have any effect. enable_batch_inserting(SignallingSession)docker build -t sqla_batch . docker run --rm -v $PWD:/src --name sqla_batch sqla_batch # Wait for it to finish spinning up # Switch to another shell docker exec -it sqla_batch src/demo.py no 100 docker exec -it sqla_batch src/demo.py yes 100 To simulate 100 * 3 inserts with 20 ms latency, first change the connection string in demo.py from postgresql+psycopg2://postgres@localhost to postgresql+psycopg2://postgres@db. Then,
docker network create sqla_batch docker run --rm --network sqla_batch --network-alias db --name db sqla_batch # Switch to another shell docker run -it -v /var/run/docker.sock:/var/run/docker.sock --network sqla_batch gaiaadm/pumba netem --duration 15m --tc-image gaiadocker/iproute2 delay --time 20 --jitter 0 db # Switch to another shell # This should take 100 * 3 * 20 ms = 6 seconds docker run -it --rm -v $PWD:/src --network sqla_batch sqla_batch src/demo.py no 100 # This should take 4 * 20 ms = 0.08 seconds docker run -it --rm -v $PWD:/src --network sqla_batch sqla_batch src/demo.py yes 100 This is all possible thanks to @dvarrazzo's psycopg2 execute_batch and @zzzeek's SQLAlchemy support for the same and helpful advice on the mailing list.
After bumping the version in setup.py and __version__ in __init__.py,
$ ./setup.py sdist $ twine upload --repository-url https://test.pypi.org/legacy/ dist/* # Check https://test.pypi.org/project/sqlalchemy-batch-inserts/ $ twine upload dist/*