3

We have some unit-tests that are checking UTF-8 byte marking of an XML string before it's loaded into an XmlDocument. Everything works fine using Windows 7 64-bit, but we noticed a bunch of tests failing while trying to run under Windows 10 64-bit.

After a bit of investigation, we found that the XML string on Windows 10 is getting pruned (the preamble exists), while on Windows 7 it does not.

Here is the code snippet:

 public static string PruneUtf8ByteMark(string xmlString) { var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); if (xmlString.StartsWith(byteOrderMarking)) { xmlString = xmlString.Remove(0, byteOrderMarking.Length); } return xmlString; } 

StartsWith is returning true for Windows 10, and false for Windows 7. Note that the same XML string is being used, the only difference here is the OS.

Any ideas? We are a bit lost here, since both PCs are x64 running the same .NET version.

edit: The string comes from a class via:

public static string XmlString = "<?xml version=\"1.0\".... 

On Windows 10, the less than sign gets truncated because the byte mark check is true.

7
  • What does the string come from? Commented Feb 6, 2017 at 22:01
  • @SLaks the string comes from dummy test data. It's an XML string that's wrapped in a class and accessible via public static. Commented Feb 6, 2017 at 22:02
  • hard to believe if that string is not read from a file. which value does the debugger show for byteOrderMarking in either case? Commented Feb 6, 2017 at 22:06
  • 1
    You should use StringComparison.Ordinal in the second argument of StartWith to avoid culture based comparison. Commented Feb 6, 2017 at 22:08
  • @dlatikay, yeah definitely not from a file. the mask (preamble) shows EF BB BF, but the GetString of it shows "" for both OSs. Commented Feb 6, 2017 at 22:18

1 Answer 1

2

The problem is cause by culture sensitive comparison.

The byteOrderMarking is not a visible character so it will be trimmed during comparison.

See the following case :

"".StartsWith("") // = true "aa".StartsWith("") // = true "aa".StartsWith("", StringComparison.Ordinal) // = true 

So every string start with an empty string. Now with byteOrderMarking :

var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()); byteOrderMarking.Equals("") // = False byteOrderMarking.Equals("", StringComparison.CurrentCulture) // = True byteOrderMarking.Equals("", StringComparison.Ordinal) // = False 

Now we can see that byteOrderMarking is equal to an empty string only with Current culture comparison. When you try to check is a string start with byteOrderMarking, it's like to compare to an an empty string.

The difference between Ordinal and CurrentCulture is that the first is a byte to byte comparison, whereas the second will by normalize according to the culture.

Lastly, I suggest to always use Ordinal (or OrdinalIgnoreCase) to compare technical strings.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. Yeah, I understand why to use Ordinal, but I still do not understand why the string comparison then is different across OS versions.
In win10 they add many more new supported languages. That can be related. The .Net framework depend often on windows api. So if the os change, the framework could be affected.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.