Removing BOM characters from AJAX-posted string

Question

My content contains multiple BOM (EF BB BF) characters and I want to remove them. The characters are in the middle of strings I want to simply remove them all.

The data containing the BOMS comes from a JavaScript source, which I POST to the backend. For now, they are saved as is, but this results in errors in post-processing when the characters are interpreted and start showing up mid-content. I suspect they come from something that was copypasted into my editor.

I can step through the string char by char, but I don't know how to compare against the BOM. Would it somehow be possible to compare the hex values of the string bytes and compare three byte sequences?

Hans Passant · Accepted Answer · 2012-10-23 09:50:27Z

11

The utf-8 BOM bytes get translated to \ufeff. Unicode character "Zero width no-break space", can't see them, can't hear them. Filter them out with:

 var good = bad.Replace("\ufeff", "");

answered Oct 23, 2012 at 9:50

Hans Passant

946k151 gold badges1.8k silver badges2.6k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Joel Peltonen Over a year ago

Great success! One question though, might this cause problems by removing other bytes that get translated into the same unicode character? I doubt that I'll miss any if they get removed but are there other important or worth-mentioning such characters?

Hans Passant Over a year ago

You can't see them, you can't hear them.

Harps Over a year ago

To replace any ocurence in the string, use: const goodStr = badStr.split('\ufeff').join('');

Peter Stock · Accepted Answer · 2012-10-23 07:06:26Z

1

Try the following:

CleanString = DirtyString.Replace("\u00EF\u00BB\u00BF", null);

answered Oct 23, 2012 at 7:06

Peter Stock

3111 silver badge7 bronze badges

2 Comments

Joel Peltonen Over a year ago

The way I tested this was to do string s2 = s.Replace(...) and then Debug.WriteLine(s2);. Then I copy-pasted the output from my output window to Notepad++ and switched to view HEX: I still see the BOM. Did I try it wrong?

Peter Stock Over a year ago

That's how it is working for me. Maybe you find this helpful.

Collectives™ on Stack Overflow

Removing BOM characters from AJAX-posted string

2 Answers 2

3 Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Linked

Related