I'm working on quite a complex for which I do need your advice. I need to copy data from one table to another table, and I know that for this approach, there are quite a few solutions like
If both are the same schema
INSERT INTO newTable SELECT * FROM oldTable If both are a different schema
INSERT INTO newTable (col1, col2, col3) SELECT column1, column2, column3 FROM oldTable A SQL Cursor can also be used, and a lot more variations does exists. But let me summarize the problem:
The problem
I do have a CSV file which contains approx. 1.5 million records. Each record currently has 6 fields which are being imported.
Now, to insert the data from the CSV file into SQL server, I'm using C# in combination with Entity Framework.
For performance, I'll insert all those records in a temporary table. This is the schema of the temporary table:
CREATE TABLE [dbo].[TEMP_GENERIC_ARTICLE]( [Id] [int] IDENTITY(1,1) NOT NULL, [GlnCode] [nvarchar](100) NULL, [Description] [nvarchar](max) NULL, [VendorId] [nvarchar](100) NULL, [VendorName] [nvarchar](100) NULL, [ItemNumber] [nvarchar](100) NULL, [ItemUOM] [nvarchar](max) NULL, [DateCreatedInternal] [datetime] NOT NULL, [DateUpdatedInternal] [datetime] NOT NULL, CONSTRAINT [PK_dbo.TEMP_GENERIC_ARTICLE] PRIMARY KEY CLUSTERED ( [Id] ASC ) Then I do have a table which another application will consume, called T_GENERIC_ARTICLE which schema is:
CREATE TABLE [dbo].[T_GENERIC_ARTICLE]( [GlnCode] [nvarchar](100) NOT NULL, [Description] [nvarchar](max) NULL, [VendorId] [nvarchar](100) NOT NULL, [VendorName] [nvarchar](100) NULL, [ItemNumber] [nvarchar](100) NOT NULL, [ItemUOM] [nvarchar](128) NOT NULL, CONSTRAINT [PK_dbo.T_GENERIC_ARTICLE] PRIMARY KEY CLUSTERED ( [GlnCode] ASC, [VendorId] ASC, [ItemNumber] ASC, [ItemUOM] ASC ) So, the real table doesn't have the 'ID' field anymore and has primary key which spans 4 columns on the database.
Now, what I would like to do:
As soon as the data is stored in the Temp table, or every 1000 records for example, I need to run a SQL Stored Procedure which will copy the data from the temp table to the destination table.
Upon the copy, I need to check wether a record with that primary key is already existing. If that's the case, then I would like to update the record, otherwise I want to insert a new record.
After the copy has been fully done I would like to remove all the records in the temp table.
The question
What's the best approach to work which such a large data set (1.5 million records) in order to transfer the records from the temp, to the destination table as performant and as fast as possible?
I've never worked with such a large data sets so I really need some advice on this one.
Kind regards
MERGEcommand in one step with appropriate time i suppose. As it said in comment before, it is easier to merge data directly from CSV. With SSIS if your do this regualry, or withBULK INSERTandMERGEif it is once-only stackoverflow.com/questions/23026501/…