1

I have a large table (> 50M rows) that I'm trying to convert into a new table format. As part of this, the row ids returned from the first insert need to be used in a second insert into a second table.

I am using a CTE that looks something like this:

WITH inserted_data AS ( INSERT INTO table1 (column1, column2, column3, column4) SELECT value1, value2, value3, value4 FROM original_table RETURNING rowid, column4 ) INSERT INTO table2 (table1_id, column4) SELECT rowid, column4 FROM inserted_data; 

The problem is, this is unacceptably slow. For only 4800 rows it takes 21 seconds, while for 9600 rows it takes around 50 seconds. At this rate I expect it would take around 3 days for 50 million records. I have a time constraint of around 2 hours.

If I perform just the first insert (no CTE), the query is 100x times faster, around 200 milliseconds and 500 milliseconds for 4800 and 9600 rows. I know that the time for the second insert would also be negligible. At this rate the query would finish in the allotted time.

The question is, how can I rewrite the query to perform at the rate the individual queries could complete in. Shuffling all the data out of the database into an external program would be a hassle and error prone as well as require extra resources. If I do something like write a function with a loop then I'm not inserting in one go and i expect poor performance there as well. I don't think using a temporary table will help, because the issue seems to be with the mere existence of the CTE.

I tried this:

INSERT INTO table2 (table1_id, column4) SELECT rowid, column4 FROM ( WITH inserted_data AS ( INSERT INTO table1 (column1, column2, column3, column4) SELECT value1, value2, value3, value4 FROM original_table RETURNING rowid, column4 ) ) 

But this gives me:

syntax error at or near "INTO" 

1 Answer 1

1

How about separating the inserts in two statements? The second insert could look like:

INSERT INTO table2 (table1_id, column4) SELECT t1.rowid , ot.column4 FROM original_table ot JOIN table1 t1 ON t1.column1 = ot.value1 AND t1.column2 = ot.value2 AND t1.column3 = ot.value3 AND t1.column4 = ot.value4 

For each row in original_table, the join conditions looks for the row inserted into table1.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.