I have this text:
$string = "this is my friend's website http://example.com I think it is coll"; How can I extract the link into another variable?
I know it should be by using regular expression especially preg_match() but I don't know how?
I have this text:
$string = "this is my friend's website http://example.com I think it is coll"; How can I extract the link into another variable?
I know it should be by using regular expression especially preg_match() but I don't know how?
Probably the safest way is using code snippets from WordPress. Download the latest one (currently 3.1.1) and see wp-includes/formatting.php. There's a function named make_clickable which has plain text for param and returns formatted string. You can grab codes for extracting URLs. It's pretty complex though.
This one line regex might be helpful.
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $string, $match); But this regex still can't remove some malformed URLs (ex. http://google:ha.ckers.org ).
I tried to do as Nobu said, using Wordpress, but to much dependencies to other WordPress functions I instead opted to use Nobu's regular expression for preg_match_all() and turned it into a function, using preg_replace_callback(); a function which now replaces all links in a text with clickable links. It uses anonymous functions so you'll need PHP 5.3 or you may rewrite the code to use an ordinary function instead.
<?php /** * Make clickable links from URLs in text. */ function make_clickable($text) { $regex = '#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#'; return preg_replace_callback($regex, function ($matches) { return "<a href=\'{$matches[0]}\'>{$matches[0]}</a>"; }, $text); } create_function().URLs have a quite complex definition — you must decide what you want to capture first. A simple example capturing anything starting with http:// and https:// could be:
preg_match_all('!https?://\S+!', $string, $matches); $all_urls = $matches[0]; Note that this is very basic and could capture invalid URLs. I would recommend catching up on POSIX and PHP regular expressions for more complex things.
The code that worked for me (especially if you have several links in your $string):
$string = "this is my friend's website https://www.example.com I think it is cool, but this one is cooler https://www.stackoverflow.com :)"; $regex = '/\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|$!:,.;]*[A-Z0-9+&@#\/%=~_|$]/i'; preg_match_all($regex, $string, $matches); $urls = $matches[0]; // go over all links foreach($urls as $url) { echo $url.'<br />'; } Hope that helps others as well.
If the text you extract the URLs from is user-submitted and you're going to display the result as links anywhere, you have to be very, VERY careful to avoid XSS vulnerabilities, most prominently "javascript:" protocol URLs, but also malformed URLs that might trick your regexp and/or the displaying browser into executing them as Javascript URLs. At the very least, you should accept only URLs that start with "http", "https" or "ftp".
There's also a blog entry by Jeff where he describes some other problems with extracting URLs.
preg_match_all('/[a-z]+:\/\/\S+/', $string, $matches); This is an easy way that'd work for a lot of cases, not all. All the matches are put in $matches. Note that this do not cover links in anchor elements (<a href=""...), but that wasn't in your example either.
You could try this to find the link and revise the link (add the href link).
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/"; // The Text you want to filter for urls $text = "The text you want to filter goes here. http://example.com"; if(preg_match($reg_exUrl, $text, $url)) { echo preg_replace($reg_exUrl, "<a href="{$url[0]}">{$url[0]}</a> ", $text); } else { echo "No url in the text"; } refer here: http://php.net/manual/en/function.preg-match.php
There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.
I created a PHP library that could deal with lots of edge cases: Url highlight.
Example:
<?php use VStelmakh\UrlHighlight\UrlHighlight; $urlHighlight = new UrlHighlight(); $urlHighlight->getUrls("this is my friend's website http://example.com I think it is coll"); // return: ['http://example.com'] For more details see readme. For covered url cases see test.
Here is a function I use, can't remember where it came from but seems to do a pretty good job of finding links in the text. and making them links.
You can change the function to suit your needs. I just wanted to share this as I was looking around and remembered I had this in one of my helper libraries.
function make_links($str){ $pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))'; return preg_replace_callback("#$pattern#i", function($matches) { $input = $matches[0]; $url = preg_match('!^https?://!i', $input) ? $input : "http://$input"; return '<a href="' . $url . '" rel="nofollow" target="_blank">' . "$input</a>"; }, $str); } Use:
$subject = 'this is a link http://google:ha.ckers.org maybe don't want to visit it?'; echo make_links($subject); Output
this is a link <a href="http://google:ha.ckers.org" rel="nofollow" target="_blank">http://google:ha.ckers.org</a> maybe don't want to visit it? <?php preg_match_all('/(href|src)[\s]?=[\s\"\']?+(.*?)[\s\"\']+.*?/', $webpage_content, $link_extracted); This Regex works great for me and i have checked with all types of URL,
<?php $string = "Thisregexfindurlhttp://www.rubular.com/r/bFHobduQ3n mixedwithstring"; preg_match_all('/(https?|ssh|ftp):\/\/[^\s"]+/', $string, $url); $all_url = $url[0]; // Returns Array Of all Found URL's $one_url = $url[0][0]; // Gives the First URL in Array of URL's ?> Checked with lots of URL's can find here http://www.rubular.com/r/bFHobduQ3n
public function find_links($post_content){ $reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/"; // Check if there is a url in the text if(preg_match_all($reg_exUrl, $post_content, $urls)) { // make the urls hyper links, foreach($urls[0] as $url){ $post_content = str_replace($url, '<a href="'.$url.'" rel="nofollow"> LINK </a>', $post_content); } //var_dump($post_content);die(); //uncomment to see result //return text with hyper links return $post_content; } else { // if no urls in the text just return the text return $post_content; } }