The following C# method counts string characters considering combining characters (Grapheme Clusters). Here it is:
public static class StringExtensions { public static SqlInt32 GetStrLength(this string input) { if (string.IsNullOrEmpty(input)) return 0; return StringInfo.ParseCombiningCharacters(input).Length; } } Now, I create a CLR from it to use inside SQL Server. Here is its code:
public static class UserDefinedFunctions { [SqlFunction(IsDeterministic = true, IsPrecise = true)] public static SqlInt32 GetStrLength(SqlString input) { if (input.IsNull) return 0; return StringInfo.ParseCombiningCharacters(input.Value).Length; } } The C# version works well, but in SQL Server, it doesn't count properly. What's the problem?
Here are a few examples where the SQLCLR function cannot count correctly:
| SQLCLR version (Wrong) | non-SQLCLR version (Correct) |
|---|---|
| 'π©π»' -> 2 | 'π©π»' -> 1 |
| 'π¨π»ββ€οΈβπβπ©πΌ' -> 9 | 'π¨π»ββ€οΈβπβπ©πΌ' -> 1 |
Here is the SQL code I have run to get the length:
SELECT dbo.GetStringLength(body) FROM notes; And the following is the SQL code used to register the SQLCLR:
EXEC sp_configure 'show advanced options' , 1; RECONFIGURE; EXEC sp_configure 'clr enable' ,1; RECONFIGURE; EXEC sp_configure 'clr strict security', 0; RECONFIGURE; CREATE ASSEMBLY StringUtils FROM 'C:\GraphemeClusters.dll' WITH PERMISSION_SET = SAFE; CREATE FUNCTION dbo.GetStringLength(@input NVARCHAR(MAX)) RETURNS INT AS EXTERNAL NAME StringUtils.UserDefinedFunctions.GetStrLength;
ncharandnvarcharvalues will be using UCS-2 encoding instead of UTF-16 like C# expects. (They're not the same thing.)_SCand_140_collations only really affect the behavior of built-in function, and only in relation to supplementary characters (I only mention this because combining characters can be either BMP or supplementary, so built-in functions should work as expected with combining characters so long as they are in the BMP).StringInfo.LengthInTextElements, in both SQL Server 2017 and 2022, using the following test string,DECLARE @Input NVARCHAR(50) = NCHAR(0x0303) + NCHAR(0x0303) + N'o' + NCHAR(0x0303) + NCHAR(0x0302) + NCHAR(0x0303) + NCHAR(0x0302);, and the expected value of 3 was returned in all cases. So, again, please update the question with: 1) the version of SQL Server, 2) your test queries, and 3) the results.