Regexp to wrap each word on HTML page

Question

Is it possible to wrap each word on HTML page with span element? I'm trying something like

/(\s*(?:<\/?\w+[^>]*>)|(\b\w+\b))/g

but results far from what I need.

Thanks in advance!

You can't parse HTML with regex, only Chuck Norris can. stackoverflow.com/questions/1732348/… — stewe
– stewe, Commented Aug 21, 2011 at 21:29
Of course you can use regexes to parse HTML. In fact, some times you even should. However, Javascript has some of the most horrible regexes of any programming language anywhere. The XRegExp plugin helps, but it still sucks. It's easier to teach a pig to sing, and less annoying.Either do all Real™ work serverside where you can use a Real™ programming language, or else be prepared to improvise a 6-voice fugue for unaccompanied porcine chorus. — tchrist
– tchrist, Commented Aug 22, 2011 at 0:16
Thanks guys, it seems I need to look in a direction of getting all text nodes and working with them. — Roman
– Roman, Commented Aug 22, 2011 at 7:45

jAndy · Accepted Answer · 2011-08-22 08:08:38Z

Well, I don't ask for the reason, you could do it like this:

function getChilds( nodes ) { var len = nodes.length; while( len-- ) { if( nodes[len].childNodes && nodes[len].childNodes.length ) { getChilds( nodes[len].childNodes ); } var content = nodes[len].textContent || nodes[len].text; if( nodes[len].nodeType === 3 ) { var parent = nodes[len].parentNode, newstr = content.split(/\s+/).forEach(function( word ) { var s = document.createElement('span'); s.textContent = word + ' '; parent.appendChild(s); }); parent.removeChild( nodes[len] ); } }; } getChilds( document.body.childNodes );

Even tho I have to admit I didn't test the code yet. That was just the first thing which came to my mind. Might be buggy or screw up completely, but for that case I know the gentle and kind stackoverflow community will kick my ass and downvote like hell :-p

Why this line: var each = Array.prototype.forEach;? there doesn't seem to be a point to it.
Yeah, first line is confusing, could you explain this? Anyway, with some modification this solved my problem. Thanks!
@Brock: yay you're right. Thats a hangover from a further version. I'll remove it.

Community · Accepted Answer · 2017-05-23 12:19:52Z

You're going to have to get down to the "Text" nodes to make this happen. Without making it specific to a tag, you really to to traverse every element on the page, wrap it, and re-append it.

With that said, try something like what a garble post makes use of (less making fitlers for words with 4+ characters and mixing the letters up).

Victor · Accepted Answer · 2011-08-21 22:17:33Z

To get all words between span tags from current page, you can use:

var spans = document.body.getElementsByTagName('span'); if (spans) { for (var i in spans) { if (spans[i].innerHTML && !/[^\w*]/.test(spans[i].innerHTML)) { alert(spans[i].innerHTML); } } } else { alert('span tags not found'); }

My understanding is not to filter based on if they're already in a span, but to make every word itself get wrapped in a new span. ...maybe I'm misinterpreting?

shesek · Accepted Answer · 2011-08-21 22:50:17Z

You should probably start off by getting all the text nodes in the document, and working with their contents instead of on the HTML as a plain string. It really depends on the language you're working with, but you could usually use a simple XPath like //text() to do that.

In JavaScript, that would be document.evaluate('//text()', document.body, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null), than iterating over the results and working with each text node separately.

mVChr · Accepted Answer · 2011-08-22 05:39:29Z

See demo

Here's how I did it, may need some tweaking...

var wrapWords = function(el) { var skipTags = { style: true, script: true, iframe: true, a: true }, child, tag; for (var i = el.childNodes.length - 1; i >= 0; i--) { child = el.childNodes[i]; if (child.nodeType == 1) { tag = child.nodeName.toLowerCase(); if (!(tag in skipTags)) { wrapWords(child); } } else if (child.nodeType == 3 && /\w+/.test(child.textContent)) { var si, spanWrap; while ((si = child.textContent.indexOf(' ')) >= 0) { if (child != null && si == 0) { child.splitText(1); child = child.nextSibling; } else if (child != null) { child.splitText(si); spanWrap = document.createElement("span"); spanWrap.innerHTML = child.textContent; child.parentNode.replaceChild(spanWrap, child); child = spanWrap.nextSibling; } } if (child != null) { spanWrap = document.createElement("span"); spanWrap.innerHTML = child.textContent; child.parentNode.replaceChild(spanWrap, child); } } } }; wrapWords(document.body);

See demo

Collectives™ on Stack Overflow

Regexp to wrap each word on HTML page

5 Answers 5

3 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

3 Comments

1 Comment

1 Comment

Comments

Comments

Linked

Related