I am very puzzled as to the behavior of some multiprocessing code that is using psycopg2 to make queries in parallel to a postgres db.
Essentially, I am making the same query (with different params) to various partitions of a larger table. I am using multiprocessing.Pool to fork off a separate query.
My multiprocessing call looks like this:
pool = Pool(processes=num_procs) results=pool.map(run_sql, params_list) My run_sql code looks like this:
def run_sql(zip2): conn = get_connection() curs = conn.cursor() print "conn: %s curs:%s pid=%s" % (id(conn), id(curs), os.getpid()) ... curs.execute(qry) records = curs.fetchall() def get_connection() ... conn = psycopg2.connect(user=db_user, host=db_host, dbname=db_name, password=db_pwd) return conn So my expectation is that each process would get a separate db connection via the call to get_connection() and that print id(conn) would display a distinct value. However, that doesn't seem to be the case and I am at a loss to explain it. Even print id(curs) is the same. Only print os.getpid() shows a difference. Does it somehow use the same connection for each forked process ?
conn: 4614554592 curs:4605160432 pid=46802 conn: 4614554592 curs:4605160432 pid=46808 conn: 4614554592 curs:4605160432 pid=46810 conn: 4614554592 curs:4605160432 pid=46784 conn: 4614554592 curs:4605160432 pid=46811