Postgres connections in many threads

Question

I need advaice in a special case.

I have a program like this:

data = [...] multithread.Pool(n, data) def slow_function(data) db = psycopg2.connect(credentials) cursor = db.cursor() new_data = realy_slow_func() some_query = "some update query" cursor.execute(some_query )

Is opening new connection in each thread safe? It doesn't matter if it's slow, and faster approaches exists.

Threads are necessary because realy_slow_func() is slow.
Credentials for database are the same for each threads
I am using psycopg2

shoaib30 · Accepted Answer · 2021-08-05 08:59:42Z

You should be using a connection pool, which will create a pool of connections and reuse the same connections across your thread. I would suggest using a ThreadPool too so that the number of threads running at a time is equal to the number of connections available in the DB Connection Pool. But for the scope of this question, I will talk about DB Connection Pool

I have not tested the code, but this is how it would look. You first create a connectionPool and then get a connection from it within your thread, and once complete release the connection. You could also manage the get connection and release, outside of the thread and just pass the connection as parameter, and release once thread completes

Highlighting ThreadedConnectionPool as the class used to create the pool as the name suggests works with threads.

From docs:

A connection pool that works with the threading module. Note This pool class can be safely used in multi-threaded applications.

import psycopg2 from psycopg2 import pool postgreSQL_pool = psycopg2.pool.ThreadedConnectionPool(1, 20, user="postgres", password="pass@#29", host="127.0.0.1", port="5432", database="postgres_db") data = [...] multithread.Pool(n, data) def slow_function(data): db = postgreSQL_pool.getconn() cursor = db.cursor() new_data = realy_slow_func() some_query = "some update query" cursor.execute(some_query) cursor.close() postgreSQL_pool.putconn(db)

Source: https://pynative.com/psycopg2-python-postgresql-connection-pooling/

Docs: https://www.psycopg.org/docs/pool.html

I wanteded to avoid ThreadedConnectionPool, but your are right. It's just an answer.
@PatrykOrganiściak Any specific reason to avoid it? If yes, then I can think of ways around it
I was just using that method with multi-connections in some scrapers and was curious if I really should change it, because it didnt make any problems yet (40 threads tested with a few query every second each). And that was just more easy for me :) You convinced me to change it anyway because it looks really simple too. Thanks!
@PatrykOrganiściak Depending on where your DB is, you would have a max connection which would limit how many threads can run with independent connections. If it's local, you can bump it up until you're short of resources. And then you can use a python ThreadPool from multithreading to limit the number of concurrent threads equal to maximum connections. That way you don't need a PG Pool reference

Collectives™ on Stack Overflow

Postgres connections in many threads

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related