0

I'm cleaning the output created by a wysiwyg, where instead of inserting a break it simply creates an empty p tag, but it sometimes creates other empty tags that's not needed.

I have a regex to remove all empty tags, but I want to exclude empty p tags from it. how do I do that?

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; s = s.trim().replace( /<(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' ) console.log(s)

4
  • Hi there - a bit of Stack Overflow tradition is to share this link with anyone attempting to use regex to match HTML content - stackoverflow.com/questions/1732348/… Commented May 16, 2018 at 8:09
  • I'd suggest using a dedicated HTML parsing library to perform this task since there are so many edge cases that you would need to handle - it will get very complex and hard to manage. Commented May 16, 2018 at 8:10
  • HTML vs regex - everlasting war :) Commented May 16, 2018 at 8:10
  • Your case don't really fall in the exceptions where it could be suitable to use regex for HTML, i fear. You can still inject it in a div and filter the content using DOM, if using a parser bothers you Commented May 16, 2018 at 8:11

3 Answers 3

1

You can use DOMParser to be on the safe side.

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; const parser = new DOMParser(); const doc = parser.parseFromString(s, 'text/html'); const elems = doc.body.querySelectorAll('*'); [...elems].forEach(el => { if (el.textContent === '' && el.tagName !== 'P') { el.remove(); } }); console.log(doc.body.innerHTML);

Sign up to request clarification or add additional context in comments.

Comments

1

I understand that you want to use regex for that, but there are better ways. Consider using DOMParser:

var x = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>" var parse = new DOMParser; var doc = parse.parseFromString(x,"text/html"); Array.from(doc.body.querySelectorAll("*")) .filter((d)=>!d.hasChildNodes() && d.tagName.toUpperCase() !== "P") .forEach((d)=>d.parentNode.removeChild(d)); console.log(doc.body.innerHTML); //"<h1>test</h1><p>a</p><p></p>" 

You can wrap the above in a function and modify as you like.

1 Comment

that is a great answer. is there a more efficient way with jQuery or ES6? thanks
1

Add (?!p) to your regex. This is called Negative Lookahead:

let s = "<h1>test</h1><h1></h1><p>a</p><p></p><h2></h2>"; s = s.trim().replace( /<(?!p)(\w*)\s*[^\/>]*>\s*<\/\1>/g, '' ) console.log(s)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.