0

Im using the following regex to convert urls to href links. It works great, however ive found a bug with it when using style tags which have a background image.

 /** * Convert urls in a string to a html link * @return string */ public static function ConvertUrlsToHtml($str) { $str = preg_replace( '@(?<![.*">])\b(?:(?:https?|ftp|file)://|[a-z]\.)[-A-Z0-9+&#/%=~_|$?!:,.]*[A-Z0-9+&#/%=~_|$]@i', '<a href="\0">\0</a>', $str); return $str; } 

If i use the following...

<div class="inner-left" style="background-image: url(http://www.somewebsite/background.jpg);"></div> 

It converts the background image to a href too.

Does anyone know how i can tweak the regex to ignore the style tags?

5
  • 2
    You shouldn't use regex to parse html. Use DOM instead. Commented Apr 8, 2013 at 15:01
  • Use a DOM Parser to parse HTML, regex gets a nightmare to handle. Commented Apr 8, 2013 at 15:04
  • That doesnt help. Using DOMDocument doesnt work here, because its not neccessarily html. Its just a string which may or may not contain any html. So therefore, i need to find all instances of a URL in any given string, be it html or not. Then i need to create the html. Commented Apr 8, 2013 at 15:08
  • @SamuelDavidHudson: At least, using a HTML Parser will take the text in HTML (which should not be replaced) out of the way and let you work on the text alone. Commented Apr 8, 2013 at 15:20
  • 2
    Try adding another negative lookbehind: preg_replace( '@(?<!url\()(?<![.*">])... Commented Apr 8, 2013 at 15:21

1 Answer 1

1

You can start by removing HTML tags, because you don't want to replace URLs inside tags. It is true for style=, it is also true for <img src=... and <a href=...> and so on.

function ConvertUrlsToHtml($str) { $strNoTags = strip_tags($str); if (preg_match_all( '@(?<![.*">])\b(?:(?:https?|ftp|file)://|[a-z]\.)[-A-Z0-9+&#/%=~_|$?!:,.]*[A-Z0-9+&#/%=~_|$]@i', $strNoTags, $matches)) { foreach ($matches[0] as $match) { $str = str_replace($match, "<a href=\"$match\">$match</a>", $str); } } return $str; } 

What it does:

  1. Remove the tags
  2. Get all URL in the tag free string
  3. Replace found URLs by a link in the original string

As it was commented, you could always try a HTML parser first to extract the text instead of strip_tags.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.