1

I wanna check if a url string match keyword, for example, keyword is google.com, if url string is google.com or https://google.com, then return true, if url is google.com/search or something like that, return true, if url is google.com.id, then return false as it's a different url, I tried one as below but it doesn't work, how to write regular expression? thank u

regexp.MatchString(`^(?:https?://)?([-a-z0-9]+)(?:\.`+keyword+`)*$`, urlstr) 

btw, as far as I understood, regular expression will cause some performance issue, anyone can provide other solutions to handle it?

1 Answer 1

3

You can use

regexp.MatchString(`^(?:https?://)?(?:[^/.\s]+\.)*` + regexp.QuoteMeta(keyword) + `(?:/[^/\s]+)*/?$`) 

See the regex demo.

Details:

  • ^ - start of string
  • (?:https?://)? - an optional http:// or https://
  • (?:[^/\s]+\.)* - zero or more repetitions of
    • [^/.\s]+ - one or more chars other than /, . and whitespace
    • \. - a dot
  • google\.com - an escaped keyword
  • (?:/[^/\s]+)* - zero or more repetitions of a / and then one or more chars other than / and whitespace chars
  • /? - an optional /
  • $ - end of string

Note you need to use regexp.QuoteMeta to escape any special chars in the keyword, like a . that matches any char but line break chars by default.

Sign up to request clarification or add additional context in comments.

5 Comments

thank u, seems works, if there will be some performance issue on regular expression? any other solutions to handle it?
@Frank I think this regex is efficient enough not to cause any performance problems. Note I assumed there will be no spaces in the URL. If you need to support whitespace, remove all \ss.
understood, thank u, let me do benchmark for it
hi got a bug, if keyword is https://google.com which contains https, but urlstr is google.com which not contains https, result will be false, but expected should be true. how to change above expression?
@Frank This is a problem with your data/task definition. If you need to handle URLs with protocol or without in keywords, make sure you only check the parts without protocol, simply remove it with strings.Replace(strings.Replace(urlstr, "https://", "", 1), "http://", "", 1) before checking.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.