5

I want to insert rows with an optional datetime value (per row) using T-SQL BULK INSERT on a SQL Server 2008 R2.

A table could look like this:

CREATE TABLE [dbo].[tbl_bulk_insert_datetime_issue] ( [id] [int] NOT NULL, [description] [varchar](20) NOT NULL, [datetime] [datetime] NULL, CONSTRAINT [pk_bulk_insert_datetime_issue] PRIMARY KEY CLUSTERED ( [id] ASC )) 

The insert batch:

BULK INSERT [dbo].[tbl_bulk_insert_datetime_issue] FROM 'C:\temp\bulkinsertsample.csv' WITH ( FIELDTERMINATOR=';' ) 

If I am to insert the following CSV content:

1;row01; 2;row02;20130401 3;row03; 4;row04;20130515 

The table' content was parsed and inserted as expected:

id description datetime ----------- -------------------- ----------------------- 1 row01 NULL 2 row02 2013-04-01 00:00:00.000 3 row03 NULL 4 row04 2013-05-15 00:00:00.000 


But if the CSV file contains invalid data for the optional datetime

1;row01; 2;row02;20130401 3;row03;not_a_datetime 4;row04;20130515 

SQL Server inserts the last valid datetime value instead of discarding the row with an error message:

id description datetime ----------- -------------------- ----------------------- 1 row01 NULL 2 row02 2013-04-01 00:00:00.000 3 row03 2013-04-01 00:00:00.000 4 row04 2013-05-15 00:00:00.000 

If I am to insert a row with an invalid datetime and no valid parsable datetime at all

1;row01;not_a_datetime 

SQL Server inserts the default value of DATETIME:

id description datetime ----------- -------------------- ----------------------- 1 row01 1900-01-01 00:00:00.000 

Why does Sql Server insert the (for that specific row) invalid datetime when using bulk insert?
If I try to insert rubbish to an INT column the row will not be inserted by the bulk batch and an error will be thrown...


UPDATE:
I will share some information about the environment I used to reproduce the behaviour.

  • OS: Windows 7 Enterprise SP1 x64
  • regional settings: de-DE
  • Microsoft SQL Server 2008 R2 (SP1) - 10.50.2500.0 (X64) Enterprise Edition
  • database collation: Latin1_General_100_CS_AS
  • file encoding: UCS-2 Little Endian (with Windows CRLF)

As hinted by Max Vernon the file encoding seems to be at fault. Setting the encoding to UTF-8 without BOM the bulk insert will throw the error I expected and discards the invalid row.

4
  • 6
    Wow, this is interesting, smells like a bug to me. Commented Nov 4, 2013 at 17:08
  • 1
    Can you share what your ANSI options for your SSMS session are set to when you execute this? Commented Nov 4, 2013 at 17:26
  • @MikeFal following are checked: QUOTED_IDENTIFIER, ANSI_NULL_DFLT_ON, ANSI_PADDING, ANSI_WARNINGS, ANSI_NULLS - How do these settings influence the behaviour of a bulk insert? Commented Nov 5, 2013 at 9:01
  • 2
    @AaronBertrand As suggested I have addressed a bug here: connect.microsoft.com/SQLServer/feedback/details/807767/… Commented Nov 5, 2013 at 13:41

2 Answers 2

4

On SQL Server 2005, and SQL Server 2012, I did the following:

USE tempdb; CREATE TABLE ImpTest ( ImpTestID INT NULL , ImpTestDate DATETIME NULL ); GO BULK INSERT ImpTest FROM 'C:\SQLServer\ImportTest.txt' WITH (FIELDTERMINATOR=','); SELECT * FROM ImpTest; DROP TABLE ImpTest; 

With the following data:

1, 2,20130406 3,NOT_DATE 4, 5,20130409 

On both servers, I received the following:

Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 3, column 2 (ImpTestDate). 

Perhaps, as Aaron noted, this is a bug that needs to be addressed through http://connect.microsoft.com

I'm wondering about some of the details of your system, including locale settings, collations, physical format of the import file (is it from a Unix system? etc).

1
  • 3
    The physical file was the hint I needed... For interest: I could reproduce the behaviour on a Sql Server 2005 with file encoding UCS-2 Little Endian and collation set to Latin1_General_CI_AS - the row with the invalid data was inserted with the wrong datetime. But setting the file encoding to UTF-8 without BOM threw the error I expected and the row was not inserted. Commented Nov 5, 2013 at 9:45
-2

Think outside the box. Ask yourself why you are holding the data in the first place. What is the database for? Well, holding data of course! But why, what is its use? It’s to make data available to reports or to a front end. In that sense, it doesn’t matter whether a date is held in a date or a character field. I’ve seen solutions offered which say to import to a staging table and then run scripts to transfer and convert the data to the final destination. Why? Just import the data and then, if you must, create a view which does the conversion. Make the view rather than the table available to reports or front end processes. From a security point of view, that is probably best practise anyway. Simple Example:

CREATE TABLE t_Input ( StartDate VARCHAR(50), EndDate VARCHAR(50) ) GO CREATE VIEW vw_Output AS SELECT StartDate = CONVERT(datetime, StartDate, 101), EndDate = CONVERT(datetime, EndDate, 101) FROM t_Input 

Then make the view available to the BI or Front end developers rather than the base table.

1
  • I fail to see how this answers the question. Commented Jan 24, 2017 at 9:06

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.