Additional characters are coming during bulk insert

Question

I am trying to bulk insert the first row from a csv file into a table with only one column. But I am getting some extra characters('n++') in the begining like this:

n++First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column

CSV file contents are like:

First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column

You can find the test.csv file here

And this is the code I am using to get the first row data in a table

declare @importSQL nvarchar(2000) declare @tempstr varchar(max) declare @path varchar(100) SET @path = 'D:\test.csv' CREATE TABLE #tbl (line VARCHAR(max)) SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' )' EXEC sp_executesql @stmt=@importSQL SET @tempstr = (SELECT TOP 1 RTRIM(REPLACE(Line, CHAR(9), ';')) FROM #tbl) print @tempstr drop table #tbl

Any idea where this extra 'n++' is coming from?

I don't think TRIM exists. Does it?

Mohammad Nadeem
– Mohammad Nadeem

2010-12-22 05:00:42 +00:00
Commented Dec 22, 2010 at 5:00 — Mohammad Nadeem
– Mohammad Nadeem, Commented Dec 22, 2010 at 5:00

Philip Fourie · Accepted Answer · 2010-12-15 06:51:06Z

4

It seems UTF-8 files are not supported by SQL Server 2005 and 2008, it will only be available in version 11!

https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001

answered Dec 15, 2010 at 6:51

Philip Fourie

118k11 gold badges69 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mohammad Nadeem Over a year ago

Seems like there is nothing like codepage = '65001' as msdn list of codepages doesn't have it for sql server 2005. See msdn.microsoft.com/en-us/library/ms186356(SQL.90).aspx

Mohammad Nadeem Over a year ago

I have reported this issue to microsoft feedback. Lets see what do they have to say. connect.microsoft.com/SQLServer/feedback/details/631379/…

Pete Carter · Accepted Answer · 2012-10-20 08:16:05Z

The extra charectors are caused by the encoding. You can use used notepad to change the encoding format from UTF-8 to Unicode. This removed the 'n++' on the first row.

Philip Fourie · Accepted Answer · 2010-12-14 06:25:12Z

3

It might be the Unicode Byte Order Mark that are being picked up.

I suggest your try setting the DATAFILETYPE option as part of your statement. See MSDN documentation for more detail: http://msdn.microsoft.com/en-us/library/aa173832%28SQL.80%29.aspx

answered Dec 14, 2010 at 6:25

Philip Fourie

118k11 gold badges69 silver badges84 bronze badges

7 Comments

Mohammad Nadeem Over a year ago

Tried DATAFILETYPE = 'char' and DATAFILETYPE = 'widechar'. But didn't help.

Mohammad Nadeem Over a year ago

I also tried CODEPAGE='RAW' and n++ now changed to ï»¿ whcih is the BOM for UTF-8. But I am still not able to resolve the issue.

Philip Fourie Over a year ago

Nadeem can you perhaps get the byte order mark using a Hex viewer on your file?

Mohammad Nadeem Over a year ago

Well I tried PartCopy to see the hex code of the csv file. According to the hex code produced by it there is ï»¿ character before First Column. But the question still remanis how to avoid bulk insert to read this BOM.

MikeAinOz Over a year ago

Try making the data type NVARCHAR

|

score 1 · Accepted Answer · 2014-04-07 16:59:21Z

Unfortunatelly, Old SQL Server versions not supports utf-8. Add the codepage parameter to bulk insert method. In your question please change your code as exists.

SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' , CODEPAGE=''65001'')'

Note that, your file must be in utf-8 format. But the problem there is, if you're upgrade your server from 2005 to 2008 the codepage 65001(utf-8) not supported and then you will get the " codepage not supported"message

brian_ds · Accepted Answer · 2019-08-16 21:17:30Z

In later versions of SQL server you can add '-C 65001' to the command to tell it to use utf-8 encoding. This will remove the n++ from the first line. That is a capital C. Of course when you type the command don't include the quotes.

Collectives™ on Stack Overflow

Additional characters are coming during bulk insert

5 Answers 5

2 Comments

Comments

7 Comments

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

7 Comments

1 Comment

Comments

Related