5

I have a script which allows to replace undesired HTML tags and escape quotes to "improve" security and prevent mainly script tag and onload injection, etc.... This script is used to "texturize" content retrieved from innerHTML.

However, it multiples near by 3 my execution time (in a loop). I would like to know if there is a better way or better regex to do it:

function safe_content( text ) { text = text.replace( /<script[^>]*>.*?<\/script>/gi, '' ); text = text.replace( /(<p[^>]*>|<\/p>)/g, '' ); text = text.replace( /'/g, '&#8217;' ).replace( /&#039;/g, '&#8217;' ).replace( /[\u2019]/g, '&#8217;' ); text = text.replace( /"/g, '&#8221;' ).replace( /&#034;/g, '&#8221;' ).replace( /&quot;/g, '&#8221;' ).replace( /[\u201D]/g, '&#8221;' ); text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

EDIT: here a fiddle: https://fiddle.jshell.net/srnoe3s4/1/. Fiddle don't like script tags in javascript string apparently so I didn't add it.

15
  • 5
    Wrong. Client side javascript sanitization can't help avoiding hack attempts. Commented Apr 21, 2017 at 9:40
  • 1
    If you are interested in fewer replace() calls, you may join those with the same replacement patterns using alternation - text = text.replace( /'|&#039;|\u2019/g, '&#8217;' ); and text = text.replace( /"|&(?:#034|quot);|\u201D/g, '&#8221;' ); Commented Apr 21, 2017 at 9:45
  • 1
    @musefan I just tested it, I think you're right. Evaluatin happens when they get appended to the DOM/ Commented Apr 21, 2017 at 9:51
  • 2
    Just one of the problems with your regex approach. Try running this string through it: <scri<script></script>pt>some malicious code here.</script> Commented Apr 21, 2017 at 10:10
  • 1
    There are a lot of questions on SO and search results on Google about this topic. Just e.g. stackoverflow.com/questions/295566/… Don't do such things on your own, if it is not the main purpose of your product. Commented Apr 21, 2017 at 10:40

1 Answer 1

-1

I'll just deal with performance and naive security checks since writing a sanitizer is not something you can do on the corner of a table. If you want to save time, avoid calling multiple times replace() if you replace with the same value, which leads you to this:

function safe_content( text ) { text = text.replace( /<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi, '' ); text = text.replace( /'|&#039;|[\u2019]/g, '&#8217;'); text = text.replace( /"|&#034;|&quot;|[\u201D]/g, '&#8221;' ) text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

If you take into account dan1111's comment about weird string input that will break this implementation, you can add while(/foo/.test(input)) to avoid the issue:

function safe_content( text ) { while(/<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi.test(text)) text = text.replace( /<script[^>]*>.*?<\/script>|(<\/?p[^>]*>)/gi, '' ); while(/'|&#039;|[\u2019]/g.test(text)) text = text.replace( /'|&#039;|[\u2019]/g, '&#8217;'); while(/"|&#034;|&quot;|[\u201D]/g.test(text)) text = text.replace( /"|&#034;|&quot;|[\u201D]/g, '&#8221;' ) while(/([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g.test(text)) text = text.replace( /([\w]+)=&#[\d]+;(.+?)&#[\d]+;/g, '$1="$2"' ); return text.trim(); }; 

in standard tests cases, this will not be a lot slower than the previous code. But if the input enter in the scope of dan1111's comment, it might be slower. See perf demo

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.