2

I'm trying to take a string in SQL Server that contains an accented character (the acute e specifically) and have it convert correctly when using FOR XML PATH as follows:

SELECT 'é' AS Accent FOR XML PATH('') /* Produces: <Accent>&eacute;</Accent> */ 

However, it always comes through with the accent, or tries to convert the other characters instead. This is what I've tried so far:

SELECT 'é' AS Accent FOR XML PATH('') /* Produces: <Accent>é</Accent> */ SELECT N'é' AS Accent FOR XML PATH('') /* Produces: <Accent>é</Accent> */ SELECT REPLACE('é', 'é', '&eacute;') AS Accent FOR XML PATH('') /* Produces: <Accent>&amp;eacute;</Accent> */ SELECT '<![CDATA[' + 'é' + ']]>' AS Accent FOR XML PATH('') /* Produces: <Accent>&lt;![CDATA[é]]&gt;</Accent> */ SELECT CAST('<![CDATA[' + 'é' + ']]>' AS XML) AS Accent FOR XML PATH('') /* Produces: <Accent>é</Accent> */ 

I've looked for quite a while and can't find anything apart from casting the end result XML into a string and manually replacing the character - I'm looking for a more correct way to do this. Is this just a bug in SQL Server? It seems to convert plenty of other characters (<, >, &, etc...).

Huge thanks in advance for any assistance.

I'm using SQL Server 2008.

7
  • Do I understand this correctly: There's nothing wrong with <Accent>é</Accent> actually, but - for any reason - you need this to be <Accent>&eacute;</Accent>? Commented Feb 24, 2017 at 17:27
  • @Shnugo correct - the XML outputted by this is imported by another system that can't read the accented character correctly without it being converted. Commented Feb 24, 2017 at 17:29
  • Your first "What I tried says "Produces: <Accent>é</Accent> */". I'm confused. Commented Feb 24, 2017 at 17:29
  • @SQLDBA correct - the first query in the "What I tried" section outputs <Accent>é</Accent> Commented Feb 24, 2017 at 17:32
  • I see. You want it to look like "&eacute;". Is the other system using BULK IMPORT? The CODEPAGE option can solve the import issue. I've run into that before. Check this out: msdn.microsoft.com/en-us/library/ms191184.aspx Commented Feb 24, 2017 at 17:37

1 Answer 1

2

I don't know for sure, but I'd assume, that this is not possible without a hack.

SQL-Server treats XML as 2 byte utf-16 internally. So all characters, which are displayable within this range do not need escaping.

It is a different thing, when it comes to unprintable characters or letters, which have a declarative meaning, like <, > and & or quotes (in some cases).

I'd probably use a cast to NVARCHAR(MAX) and then a replace on string level at the point where you export this data. You won't be able to cast this back to XML:

--Push your test XML into a variable

DECLARE @xml XML= ( SELECT 'é' AS Accent FOR XML PATH('') ); SELECT @xml; --<Accent>é</Accent> 

--Cast it to string and do the replacement on string-level

SELECT REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') --<Accent>&eacute;</Accent> 

--Trying to cast this result back to XML fails

SELECT CAST(REPLACE(CAST(@xml AS NVARCHAR(MAX)),'é', '&eacute;') AS XML); 

XML parsing: line 1, character 16, well formed check: undeclared entity

Obvioulsy SQL Server's XML engine doesn't even know this entity...

Sign up to request clarification or add additional context in comments.

1 Comment

Bummer. My main issue with this is the size of the XML that needs to go through this - probably best to do it outside SQL Server. Thanks a ton for your above example and explanation. If I don't get anything else I'll accept this as the answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.