Why are batch inserts/updates faster? How do batch updates work?

Question

Why are batch inserts faster? Is it because the connection and setup overhead for inserting a single row is the same for a set of rows? What other factors make batch inserts faster?

How do batch updates work? Assuming the table has no uniqueness constraints, insert statements don't really have any effect on other insert statements in the batch. However, during batch updates, an update can alter the state of the table and hence can affect the outcome of other update queries in the batch.

I know that batch insert queries have a syntax where you have all the insert values in one big query. How do batch update queries look like? For e.g. if i have single update queries of the form:

update <table> set <column>=<expression> where <condition1> update <table> set <column>=<expression> where <condition2> update <table> set <column>=<expression> where <condition3> update <table> set <column>=<expression> where <condition4>

What happens when they are used in a batch. What will the single query look like ?

And are batch inserts & updates part of the SQL standard?

Qerr · Accepted Answer · 2013-01-18 13:12:47Z

I was looking for an answer on the same subject, about "bulk/batch" update. People often describe the problem by comparing it with insert clause with multiple value sets (the "bulk" part).

INSERT INTO mytable (mykey, mytext, myint) VALUES (1, 'text1', 11), (2, 'text2', 22), ...

Clear answer was still avoiding me, but I found the solution here: http://www.postgresql.org/docs/9.1/static/sql-values.html

To make it clear:

UPDATE mytable SET mytext = myvalues.mytext, myint = myvalues.myint FROM ( VALUES (1, 'textA', 99), (2, 'textB', 88), ... ) AS myvalues (mykey, mytext, myint) WHERE mytable.mykey = myvalues.mykey

It has the same property of being "bulk" aka containing alot of data with one statement.

This is a tremendous answer. I used this here: stackoverflow.com/questions/55052395/…

Quassnoi · Accepted Answer · 2009-06-17 13:34:43Z

Why are batch inserts faster?

For numerous reasons, but the major three are these:

The query doesn't need to be reparsed.
The values are transmitted in one round-trip to the server
The commands are inside a single transaction

Is it because the connection and setup overhead for inserting a single row is the same for a set of rows?

Partially yes, see above.

How do batch updates work?

This depends on RDBMS.

In Oracle you can transmit all values as a collection and use this collection as a table in a JOIN.

In PostgreSQL and MySQL, you can use the following syntax:

INSERT INTO mytable VALUES (value1), (value2), …

You can also prepare a query once and call it in some kind of a loop. Usually there are methods to do this in a client library.

Assuming the table has no uniqueness constraints, insert statements don't really have any effect on other insert statements in the batch. But, during batch updates, an update can alter the state of the table and hence can affect the outcome of other update queries in the batch.

Yes, and you may or may not benefit from this behavior.

I know that batch insert queries have a syntax where you have all the insert values in one big query. How do batch update queries look like?

In Oracle, you use collection in a join:

MERGE INTO mytable USING TABLE(:mycol) ON … WHEN MATCHED THEN UPDATE SET …

In PostgreSQL:

UPDATE mytable SET s.s_start = 1 FROM ( VALUES (value1), (value2), … ) q WHERE …

Could you please explain how to use the last specified statement? I don't quite understand it, however potentially it's something I've been looking for.
@Quassnoi I think you could improve the post by better explaining the difference between "Batch Prepared Statements" and Multi-Row Inserts/Updates (and/or the combination of the two).
I guess the OP is talking about JDBC batching (Statement.addBatch() and Statement.executeBatch()) rather then DBMS specific syntax
@a_horse_with_no_name: "What will the single query look like" - this looks like DBMS-specific to me. Nice necro comment though, I remember answering that on a lake beach!
This explains a little about parsing. docs.oracle.com/cd/B28359_01/server.111/b28318/…

hagello · Accepted Answer · 2014-11-26 17:12:56Z

The other posts explain why bulk statements are faster and how to do it with literal values.

I think it is important to know how to do it with placeholders. Not using placeholders may lead to gigantic command strings, to quoting/escaping bugs and thereby to applications that are prone to SQL injection.

Bulk insert with placeholders in PostgreSQL >= 9.1

To insert an arbitrary numbers of rows into table "mytable", consisting of columns "col1, "col2" and "col3", all in one got (one statement, one transaction):

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

You need to supply three arguments to this statement. The first one has to contain all the values for the first column and so on. Consequently, all the arguments have to be lists/vectors/arrays of equal length.

Bulk update with placeholders in PostgreSQL >= 9.1

Let's say, your table is called "mytable". It consists of the columns "key" and "value".

update mytable set value = data_table.new_value from (select unnest(?) as key, unnest(?) as new_value) as data_table where mytable.key = data_table.key

I know, this is not easy to understand. It looks like obfuscated SQL. On the other side: It works, it scales, it works without any string concatenation, it is safe and it is blazingly fast.

You need to supply two arguments to this statement. The first one has to be a list/vector/array that contains all the values for column "key". Of course, the second one has to contain all the values for column "value".

In case you hit size limits, you may have to look into COPY INTO ... FROM STDIN (PostgreSQL).

HLGEM · Accepted Answer · 2009-06-17 20:39:19Z

In a batch updates, the database works against a set of data, in a row by row update it has to run the same command as may times as there are rows. So if you insert a million rows in a batch, the command is sent and processed once and in a row-by row update, it is sent and processed a million times. This is also why you never want to use a cursor in SQL Server or a correlated subquery.

an example of a set-based update in SQL server:

update mytable set myfield = 'test' where myfield is null

This would update all 1 million records that are null in one step. A cursor update (which is how you would update a million rows in a non-batch fashion) would iterate through each row one a time and update it.

The problem with a batch insert is the size of the batch. If you try to update too many records at once, the database may lock the table for the duration of the process, locking all other users out. So you may need to do a loop that takes only part of the batch at a time (but pretty much any number greater than one row at time will be faster than one row at a time) This is slower than updating or inserting or deleting the whole batch, but faster than row-by row operations and may be needed in a production environment with many users and little available downtime when users are not trying to see and update other records in the same table. The size of the batch depends greatly on the database structure and exactly what is happening (tables with triggers and lots of constraints are slower as are tables with lots of fields and so require smaller batches).

The idea that large updates will lock the users out is only true either with bad databases or with bad application developers. SQL Server has provided the standard 4 transaction isolation levels since V7.0, you have to do something outright wrong to block anything by inserting data.

Collectives™ on Stack Overflow

Why are batch inserts/updates faster? How do batch updates work?

4 Answers 4

1 Comment

6 Comments

Bulk insert with placeholders in PostgreSQL >= 9.1

Bulk update with placeholders in PostgreSQL >= 9.1

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

6 Comments

Bulk insert with placeholders in PostgreSQL >= 9.1

Bulk update with placeholders in PostgreSQL >= 9.1

Comments

1 Comment

Linked

Related