0

I want to extract specific links from a website.

The links look like that:

/topic/Funny/G1pdeJm 

The links are always the same - except the last random chars.

I'm getting hard time to combine these parts

(preg_match("/^http:\/\//i",$str) || is_file($str)) 

and

(preg_match("/Funny(.*)/", $str) || is_file($str)) 

first code extract every links second extract from the links only the /topic/Funny/* part.

Unfortunately, I can't combine them, also I want to also block these tags:

/topic/Funny/viral /topic/Funny/time /topic/Funny/top /topic/Funny/top/week /topic/Funny/top/month /topic/Funny/top/year /topic/Funny/top/all 

2 Answers 2

2

you could try using negative lookaheads to "filter out" the urls you don't like:

.*\/Funny\/(?!viral|time|top\/week|top\/month|top\/year|top\/all|top(\n|$)).* 

demo here

Sign up to request clarification or add additional context in comments.

3 Comments

I think it will be nice if you put it inside the preg_match so the OP will understand the usage of this regex :)
honestly i'm a regex guy, not a PHP guy, or I would do that :)
thx a ton! This regex seems perfect :) but unfortunately, as @Dekel said i can't get it to work in my script :/
0

I'll prepare a battery of test strings and show the implementation of using a regex to filter the URLs.

Regex Breakdown:

^ http:// #match literal characters [^/]+ #match one or more non-slash characters (domain portion) /topic/Funny/ #match literal characters (?! #not followed by: viral #viral |time #OR time |top(?:/week|/month|/year|/all)? #OR top, top/week, top/month, top/year, top/all ) 

Implementation: (Demo)

$tests = [ 'http://example.com/topic/Funny/G1pdeJm', 'http://example.com/topic/Funny/viral', 'http://example.com/topic/Funny/time', 'http://example.com/topic/Funny/top', 'http://example.com/topic/Funny/top/week', 'http://example.com/topic/Funny/top/month', 'http://example.com/topic/Funny/top/year', 'http://example.com/topic/Funny/top/all', 'http://example.com/topic/NotFunny/IL2dsRq', ]; $result = []; foreach ($tests as $str) { if (preg_match('~^http://[^/]+/topic/Funny/(?!viral|time|top(?:/week|/month|/year|/all)?)~', $str)) { $result[] = $str; } } var_export($result); 

Output:

array ( 0 => 'http://example.com/topic/Funny/G1pdeJm', ) 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.