4

I need to insert and delete records in table 1 based on the data in multiple columns in table 2.

Example: Table1 and Table2 have identical columns: address, city, state, and zip.

Table1 is the master table with millions and millions of records. Table2 (thousands of records) is a table based on data that changed in some way - although it may not be a change in data in this table.

Matching records means that the address, city, state, and zip are identical in value in both tables.

So I want to do 2 operations:

  • Insert records into Table1 that only exist in Table2
  • Delete records from Table1 where the record is not in Table2 but the city is in Table2

Based on the size of Table1, I need a very efficient way to execute this. I know some SQL - but not to the level this would require if done with SQL.

This is a DB2 database - version 10.5.

1 Answer 1

5
  • Operation 1

    Insert records into Table1 that only exist in Table2:

    INSERT INTO Table1 (address, city, state, zip) SELECT address, city, state, zip FROM Table2 EXCEPT SELECT address, city, state, zip FROM Table1 ; 
  • Operation 2

    Delete records from Table1 where the record is not in Table2 but the city is in Table2.

    (assuming that the columns ar enot nullable. If they allow nulls, it would be better to rewrite the NOT IN with a NOT EXISTS subquery):

    DELETE FROM Table1 WHERE (address, city, state, zip) NOT IN (SELECT address, city, state, zip FROM Table2) AND city IN (SELECT city FROM Table2) ; 

As for performance and efficiency, I have almost zero experience with DB2, so it would be better if you waited for some other answer. But an index on all the columns (address, city, state, zip) - on both tables - would probably be the best you can do for the queries, plus a single index on (city). Or perhaps combine the two suggestions in one index with all the 4 columns but choose city for the 1st part of the index: (city, address, state, zip).

You haven't provided the tables' structure but the 4 columns seem appropriate for a UNIQUE constraint so you may already have such an index.

2
  • Thanks so much. I actually have a table with multiple other columns and the columns are all different than the example I gave - but the example described the problem easily and your solution should do the trick. Thanks again Commented Jul 23, 2015 at 13:31
  • @Chris: I'm afraid that syntax is Postgres specific, but DB2 does have a similar means of selecting from DML statements which could be used here (see data change table reference) (BEGIN is also a pg-ism). On the subject of performance I generally found DB2 to be happier with EXCEPT than NOT IN but ymmv. Commented Jul 23, 2015 at 21:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.