Python in Windows: large number of inserts using pyodbc causes memory leak

Question

I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:

import pyodbc connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x') cursor=connection.cursor() connection.autocommit=True while 1: cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6) mdbconn.close()

Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?

Have you tried manually committing the transactions? It looks a bit like none of this is being committed to the db. — Katriel
– Katriel, Commented Nov 3, 2010 at 15:59
Thanks. Setting connection.autocommit=False and doing a manual commit with connection.commit() has no effect on memory usage. — Serge Aluker
– Serge Aluker, Commented Nov 3, 2010 at 16:04

kermatt · Accepted Answer · 2011-06-18 01:50:06Z

I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145

Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.

This solved it. Wish I had 15 points so I could vote this up. Thanks a lot!
It appears the latest code resolves the issue without a workaround.

neotrinity · Accepted Answer · 2011-05-31 09:37:39Z

Even I had faced the same problem.

I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.

I tried the following :

Using the same cursor by dereferencing.

Closing /opening the connection

Setting the connection to None.

Finally ended up bootstrapping each XML file load using Process module.

Now I have replaced the process using IronPython - System.Data.SqlClient.

This give a better performance and also better interface.

sjh · Accepted Answer · 2010-11-03 15:58:23Z

0

Maybe close & re-open the connection every million rows or so?

Sure it doesn't solve anything, but if you only have to do this once you could get on with life!

answered Nov 3, 2010 at 15:58

sjh

2,2661 gold badge19 silver badges22 bronze badges

1 Comment

Serge Aluker Over a year ago

Thanks. I have tried connection.close() and connection=pyodbc.connect() every 10,000 inserts. It looks like memory usage goes up if anything.

freegnu · Accepted Answer · 2010-11-04 14:17:20Z

0

Try creating a separate cursor for each insert. Reuse the cursor variable each time through the loop to implicitly dereference the previous cursor. Add a connection.commit after each insert.

You may only need something as simple as a time.sleep(0) at the bottom of each loop to allow the garbage collector to run.

answered Nov 4, 2010 at 14:17

freegnu

8138 silver badges12 bronze badges

2 Comments

Serge Aluker Over a year ago

Thanks, freegnu. Creating a separate cursor doesn't have any effect. I tried time.sleep(1) after each 1000 inserts, and that didn't any have any effect either -- same for time.sleep(0) after each one.

freegnu Over a year ago

I use pymssql for development and don't see half the problems and limitations I see when using mx.odbc.windows in production. I'm guessing pyodbc is problematic as well. You might want to give pymssql a try.

ajduff574 · Accepted Answer · 2010-11-17 21:21:32Z

You could also try forcing a garbage collection every once in a while with gc.collect() after importing the gc module.

Another option might be to use cursor.executemany() and see if that clears up the problem. The nasty thing about executemany(), though, is that it takes a sequence rather than an iterator (so you can't pass it a generator). I'd try the garbage collector first.

EDIT: I just tested the code you posted, and I am not seeing the same issue. Are you using an old version of pyodbc?

Collectives™ on Stack Overflow

Python in Windows: large number of inserts using pyodbc causes memory leak

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

1 Comment

2 Comments

Comments

Linked

Related