1

I am trying to match URLs with a tested Regex expression but when I use JavaScript to evaluate it returns false.

Here is my code:

var $regex = new RegExp("<a\shref=\"(\#\d+|(https?|ftp):\/\/[-a-z0-9+&@#\/%?=~_|!:,.;\\(\\)]+)\"(\stitle=\"[^\"<>]+\")?\s?>|<\/a>"); var $test = new Array(); $test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">'; $test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">'; $test[2] = '<a href="http://www.msnbc.msn.com/id/38927104" title="dd" alt="dd">'; for(var i = 0; i < $test.length; i++) { console.log($test[i]); console.log($regex.test($test[i])); } 

Anyone have any idea what is going on?

1
  • 1
    Is it Earl? Commented Sep 2, 2010 at 21:25

2 Answers 2

2

You need to escape backslashes when creating regular expressions with new RegExp() since you pass a string and a backslash is also an escaping character for strings.

new RegExp("\s"); // becomes /s/ new RegExp("\\s"); // becomes /\s/ 

Or just write your regexp as literals.

var re = /\s/; 

Also, if you want to match URL's, why take a whole HTML tag into account? The following regexp would suffice:

var urlReg = /^(?:\#\dhttp|ftp):\/\/[\w\d\.-_]*\/[^\s]*/i; // anything past the third / that's not a space, is valid. 
Sign up to request clarification or add additional context in comments.

1 Comment

I can't believe I overlooked that. Thanks for your help, that was my problem. I've been staring at that expression for far too long trying to figure that out. Much appreciated!
0

There are multiple problems.

You need to escape backslashes. Any character with a special meaning needs to be escaped with a backslash in the regular expression, and the backslash itself needs to be escaped in the string. Effectively, \s should be represented as \\s if you construct it with new Regexp("\\s").

You need to allow more characters in your URLs. Currently you don't even allow / characters. I would propose a character class like [^"] to match everything after http://. (Escaping the " character when used in t a string will make it [^\"].

You're not taking alt attributes into account. You only match title attributes, not alt attributes.

A working example:

// Ditch new Regex("...") in favour of /.../ because it is simpler. var $regex = /<a\shref="(#\d+|(https?|ftp):\/\/[^"]+)"(\stitle="[^"]+")?(\salt="[^"]+")?|<\/a>/; var $test = new Array(); $test[0] = '<a href="http://www.nytimes.com/imagepages/2010/09/02/us/HURRICANE.html">'; $test[1] = '<a href="http://www.msnbc.msn.com/id/38877306/ns/weather/%29;">'; $test[2] = '<a href="http://www.msnbc.msn.com/id/38927104" title="dd" alt="dd">'; for(var i = 0; i < $test.length; i++) { console.log($test[i]); console.log($regex.test($test[i])); } 

All three examples match this regex.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.