regex exclude certain tag

Question

I'm cleaning the output created by a wysiwyg, where instead of inserting a break it simply creates an empty p tag, but it sometimes creates other empty tags that's not needed.

I have a regex to remove all empty tags, but I want to exclude empty p tags from it. how do I do that?

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; s = s.trim().replace( /<(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' ) console.log(s)

Hi there - a bit of Stack Overflow tradition is to share this link with anyone attempting to use regex to match HTML content - stackoverflow.com/questions/1732348/… — Lix
– Lix, Commented May 16, 2018 at 8:09
I'd suggest using a dedicated HTML parsing library to perform this task since there are so many edge cases that you would need to handle - it will get very complex and hard to manage. — Lix
– Lix, Commented May 16, 2018 at 8:10
Your case don't really fall in the exceptions where it could be suitable to use regex for HTML, i fear. You can still inject it in a div and filter the content using DOM, if using a parser bothers you — Kaddath
– Kaddath, Commented May 16, 2018 at 8:11

Matus Dubrava · Accepted Answer · 2018-05-16 08:32:47Z

You can use DOMParser to be on the safe side.

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; const parser = new DOMParser(); const doc = parser.parseFromString(s, 'text/html'); const elems = doc.body.querySelectorAll('*'); [...elems].forEach(el => { if (el.textContent === '' && el.tagName !== 'P') { el.remove(); } }); console.log(doc.body.innerHTML);

ibrahim tanyalcin · Accepted Answer · 2018-05-16 08:18:59Z

I understand that you want to use regex for that, but there are better ways. Consider using DOMParser:

var x = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>" var parse = new DOMParser; var doc = parse.parseFromString(x,"text/html"); Array.from(doc.body.querySelectorAll("*")) .filter((d)=>!d.hasChildNodes() && d.tagName.toUpperCase() !== "P") .forEach((d)=>d.parentNode.removeChild(d)); console.log(doc.body.innerHTML); //"<h1>test</h1><p>a</p><p></p>"

You can wrap the above in a function and modify as you like.

that is a great answer. is there a more efficient way with jQuery or ES6? thanks

Mamun · Accepted Answer · 2018-05-16 08:23:44Z

Add (?!p) to your regex. This is called Negative Lookahead:

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; s = s.trim().replace( /<(?!p)(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' ) console.log(s)

Collectives™ on Stack Overflow

regex exclude certain tag

3 Answers 3

Comments

1 Comment

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

1 Comment

Comments

Linked

Related