0

I have a table with unique constraint on some field. I need to insert a large number of records in this table. To make it faster I'm using batch update with JDBC (driver version is 8.3-603). Is there a way to do the following:

  • every batch execute I need to write into the table all the records from the batch that don't violate the unique index;

  • every batch execute I need to receive the records from the batch that were not inserted into DB, so I could save "wrong" records

?

3
  • No, that's not possible. You need to write the rows into an intermediate table (without constraitns) first and then use SQL to copy the valid ones into the target table. Btw: why are you using such an outdated driver version? Commented Jun 20, 2015 at 8:52
  • I'd like to use newer version, but it isn't my decision :( How can I use intermediate table more efficient? Copy line-by-line and delete copied, and then at the finish read records left in that table? Commented Jun 20, 2015 at 9:05
  • Look up "bulk upsert" Commented Jun 20, 2015 at 10:26

1 Answer 1

2

The most efficient way of doing this would be something like this:

  • create a staging table with the same structure as the target table but without the unique constraint
  • batch insert all rows into that staging table. The most efficient way is to use copy or use the CopyManager (although I don't know if that is already supported in your ancient driver version.

Once that is done you copy the valid rows into the target table:

insert into target_table(id, col_1, col_2) select id, col_1, col_2 from staging_table where not exists (select * from target_table where target_table.id = staging_table.id); 

Note that the above is not concurrency safe! If other processes do the same thing you might still get unique key violations. To prevent that you need to lock the target table.

If you want to remove the copied rows, you could do that using a writeable CTE:

with inserted as ( insert into target_table(id, col_1, col_2) select id, col_1, col_2 from staging_table where not exists (select * from target_table where target_table.id = staging_table.id) returning staging_table.id; ) delete from staging_table where id in (select id from inserted); 

A (non-unique) index on the staging_table.id should help for the performance.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, looks good. How fast this way would be? Much slower then just batch insert straight to the target table?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.