0

I use php and get the following text string from a textbox.

This is a string I have:

header1 header2 edit b-1246431 12.01.13 1246431 | blog.domain.net 1232,00 ‌‌ details b-1312231 12.01.13 1246431 | blog.domain.co.uk 12312,00 b-2344311 12.01.13 1246431 | www.domain.com/ 9129,00 ‌‌ b-2344322 12.01.13 1246431 | http://abc.de 1332,00 ‌‌ b-2344322 13.01.13 1246431 | www.cdf.de/ 21140,00 ‌‌edit b-1233422 06.01.13 1246431 | www.dto.de/site1 21110,00 b-1233542 06.01.13 1246431 | www.ghj.ca/site2.html 28110,00 ‌‌ edit b-1231242 06.01.13 1246431 | www.another.de 2101,00 ‌‌ b-1231231 04.01.13 1246431 | onlyme.info/ 

I want this output:

blog.domain.net blog.domain.co.uk www.domain.com/ http://abc.de www.cdf.de/ www.dto.de/site1 www.ghj.ca/site2.html www.another.de onlyme.info/ 

The string will change. I always need only the urls extracted. The problem might be: sometimes urls start with www, http, or dont even have both. Still they should be seen as urls.

I already looked up these posts: extracting one or more urls from a string in php http://daringfireball.net/2010/07/improved_regex_for_matching_urls

... but nothing worked for my textstring...

3
  • Looks organized enough. Why not explode twice with | and a space? Commented Feb 20, 2014 at 4:44
  • Every string will be different. The next string might not have '|' ... Commented Feb 20, 2014 at 4:49
  • When you don't have the |, will it be replaced by any other separator? If yes, you can split the string by space, and then retrieve the 5th column from the result as the URL will be in the 5th place. Also, will the URL's always be in string format and never IP? That helps, too, if all the contents before the URLs are digits. Commented Feb 20, 2014 at 4:59

2 Answers 2

3

Try it with a regular expression:

<?php $input = "header1 header2 edit b-1246431 12.01.13 1246431 | blog.domain.net 1232,00 ‌‌ details b-1312231 12.01.13 1246431 | blog.domain.co.uk 12312,00 b-2344311 12.01.13 1246431 | www.domain.com/ 9129,00 ‌‌ b-2344322 12.01.13 1246431 | http://abc.de 1332,00 ‌‌ b-2344322 13.01.13 1246431 | www.cdf.de/ 21140,00 ‌‌edit b-1233422 06.01.13 1246431 | www.dto.de/site1 21110,00 b-1233542 06.01.13 1246431 | www.ghj.ca/site2.html 28110,00 ‌‌ edit b-1231242 06.01.13 1246431 | www.another.de 2101,00 ‌‌ b-1231231 04.01.13 1246431 | onlyme.info/"; preg_match_all('#[-a-zA-Z0-9@:%_\+.~\#?&//=]{2,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9@:%_\+.~\#?&//=]*)?#si', $input, $result); foreach ($result[0] as $url) { echo $url . "<br />\n"; } 

Or see my PHPFiddle here: PHPFiddle

Sign up to request clarification or add additional context in comments.

Comments

0

try this

$lines = explode("\n", $s); foreach ($lines as $line) { if (strpos($line, "|") !== false) { $url = trim(explode(" ", trim(explode('|', $line)[1]))[0]); echo $url."<BR>"; } } 

Works on php 5.4+

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.