There is DataFrame.to_sql method, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.
8 Answers
Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql module now uses sqlalchemy to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:
from sqlalchemy import create_engine engine = create_engine('postgresql://username:password@localhost:5432/mydatabase') df.to_sql('table_name', engine) You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con is a postgresql connection):
import sql # the patched version (file is named sql.py) sql.write_frame(df, 'table_name', con, flavor='postgresql') 6 Comments
Sqlalchemy engine, can I use an existing Postgres connection created using psycopg2.connect()?Faster option:
The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.
Create an engine based on your DB specifications.
Create a table in your postgres DB that has equal number of columns as the Dataframe (df).
Data in DF will get inserted in your postgres table.
from sqlalchemy import create_engine import psycopg2 import io If you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.
engine = create_engine( 'postgresql+psycopg2://username:password@host:port/database') # Drop old table and create new empty table df.head(0).to_sql('table_name', engine, if_exists='replace',index=False) conn = engine.raw_connection() cur = conn.cursor() output = io.StringIO() df.to_csv(output, sep='\t', header=False, index=False) output.seek(0) contents = output.getvalue() cur.copy_from(output, 'table_name', null="") # null values become '' conn.commit() cur.close() conn.close() 22 Comments
contents do? Should this be the one that is written in copy_from()?output.seek(0) ?schema=your_schema parameter in the to_sql part of the code.Pandas 0.24.0+ solution
In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
import csv from io import StringIO from sqlalchemy import create_engine def psql_insert_copy(table, conn, keys, data_iter): # gets a DBAPI connection that can provide a cursor dbapi_conn = conn.connection with dbapi_conn.cursor() as cur: s_buf = StringIO() writer = csv.writer(s_buf) writer.writerows(data_iter) s_buf.seek(0) columns = ', '.join('"{}"'.format(k) for k in keys) if table.schema: table_name = '{}.{}'.format(table.schema, table.name) else: table_name = table.name sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format( table_name, columns) cur.copy_expert(sql=sql, file=s_buf) engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase') df.to_sql('table_name', engine, method=psql_insert_copy) 10 Comments
method='multi' option is fast enough. But yes, this COPY method is the fastest way right now.with is writing to an in memory buffer. The last part of the with is using an SQL statement and taking advantage of copy_expert's speed to bulk load the data. What is the middle part that starts with columns = doing?keys arguments in the psql_insert_copy function please? How does it get any keys and are the keys just the column names?df.to_sql('table_name', engine, if_exists='replace', method=psql_insert_copy) - this does create a table in your database.chunksize=10000). Any idea why they haven't added that function in pandas and only provided it as an example?This is how I did it.
It may be faster because it is using execute_batch:
# df is the dataframe if len(df) > 0: df_columns = list(df) # create (col1,col2,...) columns = ",".join(df_columns) # create VALUES('%s', '%s",...) one '%s' per column values = "VALUES({})".format(",".join(["%s" for _ in df_columns])) #create INSERT INTO table (columns) VALUES('%s',...) insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values) cur = conn.cursor() psycopg2.extras.execute_batch(cur, insert_stmt, df.values) conn.commit() cur.close() 3 Comments
Faster way to write a df to a table in a custom schema with/without index:
""" Faster way to write df to table. Slower way is to use df.to_sql() """ from io import StringIO from pandas import DataFrame from sqlalchemy.engine.base import Engine class WriteDfToTableWithIndexMixin: @classmethod def write_df_to_table_with_index( cls, df: DataFrame, table_name: str, schema_name: str, engine: Engine ): """ Truncate existing table and load df into table. Keep each column as string to avoid datatype conflicts. """ df.head(0).to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=True, index_label='id') conn = engine.raw_connection() cur = conn.cursor() output = StringIO() df.to_csv(output, sep='\t', header=False, index=True, index_label='id') output.seek(0) contents = output.getvalue() cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output) conn.commit() class WriteDfToTableWithoutIndexMixin: @classmethod def write_df_to_table_without_index( cls, df: DataFrame, table_name: str, schema_name: str, engine: Engine ): """ Truncate existing table and load df into table. Keep each column as string to avoid datatype conflicts. """ df.head(0).to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=False) conn = engine.raw_connection() cur = conn.cursor() output = StringIO() df.to_csv(output, sep='\t', header=False, index=False) output.seek(0) contents = output.getvalue() cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output) conn.commit() If you have JSON values in a column in your df then above method will still load all data correctly but the json column will have some weird format. So converting that json column to ::json may generate error. You have to use to_sql() . Add method=multi to speed things up and add chunksize to prevent your machine from freezing:
df.to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=False, method='multi', chunksize=1000) Comments
using psycopg2 you can use native sql commands to write data into a postgres table.
import psycopg2 import pandas as pd conn = psycopg2.connect("dbname='{db}' user='{user}' host='{host}' port='{port}' password='{passwd}'".format( user=pg_user, passwd=pg_pass, host=pg_host, port=pg_port, db=pg_db)) cur = conn.cursor() def insertIntoTable(df, table): """ Using cursor.executemany() to insert the dataframe """ # Create a list of tupples from the dataframe values tuples = list(set([tuple(x) for x in df.to_numpy()])) # Comma-separated dataframe columns cols = ','.join(list(df.columns)) # SQL query to execute query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s,%%s)" % ( table, cols) try: cur.executemany(query, tuples) conn.commit() except (Exception, psycopg2.DatabaseError) as error: print("Error: %s" % error) conn.rollback() return 1 1 Comment
For Python 2.7 and Pandas 0.24.2 and using Psycopg2
Psycopg2 Connection Module
def dbConnect (db_parm, username_parm, host_parm, pw_parm): # Parse in connection information credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm} conn = psycopg2.connect(**credentials) conn.autocommit = True # auto-commit each entry to the database conn.cursor_factory = RealDictCursor cur = conn.cursor() print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm)) return conn, cur Connect to the database
conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd) Assuming dataframe to be present already as df
output = io.BytesIO() # For Python3 use StringIO df.to_csv(output, sep='\t', header=True, index=False) output.seek(0) # Required for rewinding the String object copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL '' ESCAPE '\\' HEADER " # Replace your table name in place of mem_info cur.copy_expert(copy_query, output) conn.commit() Comments
Create engine (where dialect='postgres' or 'mysql', etc..):
from sqlalchemy import create_engine engine = create_engine(f'{dialect}://{user_name}@{host}:{port}/{db_name}') Session = sessionmaker(bind=engine) with Session() as session: df = pd.read_csv(path + f'/{file}') df.to_sql('table_name', con=engine, if_exists='append',index=False)