4

I am trying to bulk insert the first row from a csv file into a table with only one column. But I am getting some extra characters('n++') in the begining like this:

n++First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column 

CSV file contents are like:

First Column;Second Column;Third Column;Fourth Column;Fifth Columnm;Sixth Column 

You can find the test.csv file here

And this is the code I am using to get the first row data in a table

declare @importSQL nvarchar(2000) declare @tempstr varchar(max) declare @path varchar(100) SET @path = 'D:\test.csv' CREATE TABLE #tbl (line VARCHAR(max)) SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' )' EXEC sp_executesql @stmt=@importSQL SET @tempstr = (SELECT TOP 1 RTRIM(REPLACE(Line, CHAR(9), ';')) FROM #tbl) print @tempstr drop table #tbl 

Any idea where this extra 'n++' is coming from?

1
  • I don't think TRIM exists. Does it? Commented Dec 22, 2010 at 5:00

5 Answers 5

4

It seems UTF-8 files are not supported by SQL Server 2005 and 2008, it will only be available in version 11!

https://connect.microsoft.com/SQLServer/feedback/details/370419/bulk-insert-and-bcp-does-not-recognize-codepage-65001

Sign up to request clarification or add additional context in comments.

2 Comments

Seems like there is nothing like codepage = '65001' as msdn list of codepages doesn't have it for sql server 2005. See msdn.microsoft.com/en-us/library/ms186356(SQL.90).aspx
I have reported this issue to microsoft feedback. Lets see what do they have to say. connect.microsoft.com/SQLServer/feedback/details/631379/…
4

The extra charectors are caused by the encoding. You can use used notepad to change the encoding format from UTF-8 to Unicode. This removed the 'n++' on the first row.

Comments

3

It might be the Unicode Byte Order Mark that are being picked up.

I suggest your try setting the DATAFILETYPE option as part of your statement. See MSDN documentation for more detail: http://msdn.microsoft.com/en-us/library/aa173832%28SQL.80%29.aspx

7 Comments

Tried DATAFILETYPE = 'char' and DATAFILETYPE = 'widechar'. But didn't help.
I also tried CODEPAGE='RAW' and n++ now changed to  whcih is the BOM for UTF-8. But I am still not able to resolve the issue.
Nadeem can you perhaps get the byte order mark using a Hex viewer on your file?
Well I tried PartCopy to see the hex code of the csv file. According to the hex code produced by it there is  character before First Column. But the question still remanis how to avoid bulk insert to read this BOM.
Try making the data type NVARCHAR
|
1

Unfortunatelly, Old SQL Server versions not supports utf-8. Add the codepage parameter to bulk insert method. In your question please change your code as exists.

SET @importSQL = 'BULK INSERT #tbl FROM ''' + @path + ''' WITH ( LASTROW = 1, FIELDTERMINATOR = ''\n'', ROWTERMINATOR = ''\n'' , CODEPAGE=''65001'')' 

Note that, your file must be in utf-8 format. But the problem there is, if you're upgrade your server from 2005 to 2008 the codepage 65001(utf-8) not supported and then you will get the " codepage not supported"message

1 Comment

Is there a way to use this in SQL 2012?
0

In later versions of SQL server you can add '-C 65001' to the command to tell it to use utf-8 encoding. This will remove the n++ from the first line. That is a capital C. Of course when you type the command don't include the quotes.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.