I'm trying to figure out how to use escape characters in JS/HTML but I can't figure out how to do it. I've seen examples of .innerHTML being used but I don't understand how. Can someone please explain it in simple terms?
- Question title doesn't match question body. Which one's correct?Álvaro González– Álvaro González2014-01-30 16:13:59 +00:00Commented Jan 30, 2014 at 16:13
1 Answer
If you add content as raw text (like, as the value of a text node), and then query the .innerHTML of the container, you get back escaped HTML because that's what it'd have to look like if you were to set the .innerHTML:
var d = document.createElement('span'); var t = document.createTextNode("<b>Hello World</b>"); d.appendChild(t); console.log(d.innerHTML); // logs <b>Hello World</b> It's just the way that the .innerHTML mechanism behaves.
According to the MDN documentation, the only characters that are affected are <, >, and &. There are times when it's useful to encode other characters with HTML entities. The most common situation I think is when you want to use quotes in an HTML attribute.
An alternative to using the browsers DOM behavior is to use your own JavaScript function. Here's a (slightly modified) version of the code use in the doT template library:
function encodeHTMLSource() { var encodeHTMLRules = { "&": "&", "<": "<", ">": ">", '"': '"', "'": ''', "/": '/' }, matchHTML = /&(?!#?\w+;)|<|>|"|'|\//g; return function() { return this ? this.replace(matchHTML, function(m) { return encodeHTMLRules[m] || m; }) : this; }; } String.prototype.encodeHTML = encodeHTMLSource(); This function is designed to be added to the String prototype, which some might find distasteful (that seems to be a recent change; my older version doesn't do this). The idea is that it uses a closure to keep a mapping from the "naughty" characters to their HTML entity equivalents, as well as a regular expression to find characters to convert. Once you've done the above, you can escape any string with:
var escaped = "<b>Hello World</b>".encodeHTML(); The regular expression is written such that it avoids re-encoding existing HTML entities.