8

This question has been asked here:

but I'm completely unsatisfied with the answers. I need a way to compare two URLs for equality and ideally I won't be writing it by hand. This library needs to understand that these urls are equal

http://stackoverflow.com https://stackoverflow.com/ https://stackoverflow.com/questions/ask https://stackoverflow.com/questions/ask/ http://stackoverflow.com?paramName= http://stackoverflow.com?paramName http://stackoverflow.com?paramName1=value1&paramName2=value2 http://stackoverflow.com?paramName2=value2&paramName1=value1 http://stackoverflow.com?param name 1=value 1 http://stackoverflow.com?param%20name%201=value%201 

These URLs are not equal:

https://stackoverflow.com/questions/ask https://stackoverflow.com/questionz/ask http://stackoverflow.com?paramName1=value1&paramName2=value2 http://stackoverflow.com?paramName1=value1&paramName2=value3 

And other complicated things like this. Where can I find such a library?

BTW, here is a unit test of this:

import org.junit.Test; import java.net.URI; import java.net.URISyntaxException; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertNotSame; public class UriTest { @Test public void equality() throws URISyntaxException { assertUrlsEqual("http://stackoverflow.com", "https://stackoverflow.com/"); assertUrlsEqual("https://stackoverflow.com/questions/ask", "https://stackoverflow.com/questions/ask/"); assertUrlsEqual("http://stackoverflow.com?paramName=", "http://stackoverflow.com?paramName"); assertUrlsEqual("http://stackoverflow.com?paramName1=value1&paramName2=value2", "http://stackoverflow.com?paramName2=value2&paramName1=value1"); assertUrlsEqual("http://stackoverflow.com?param name 1=value 1", "http://stackoverflow.com?param%20name%201=value%201"); } @Test public void notEqual() throws URISyntaxException { assertUrlsNotEqual("https://stackoverflow.com/questions/ask", "https://stackoverflow.com/questionz/ask"); assertUrlsNotEqual("http://stackoverflow.com?paramName1=value1&paramName2=value2", "http://stackoverflow.com?paramName1=value1&paramName2=value3"); } private void assertUrlsNotEqual(String u1, String u2) throws URISyntaxException { //...? } private void assertUrlsEqual(String u1, String u2) throws URISyntaxException { //...? } } 
2
  • It appears you just want to see if the base URL is the same, why don't you try calling getHost() on your URL and see if it equals the other URL? Commented Aug 16, 2013 at 19:36
  • 4
    stackoverflow.com and stackoverflow.com/ really aren't equal by specification. They only happen to be equivalent for your purpose. This is why your requirement is not the stuff of public libs. Commented Aug 16, 2013 at 19:37

2 Answers 2

9

java.net.URI will compare two URLs without network requests (the way java.net.URL does), and you can use the normalize method to make a URL with an absolute path path-canonical.

There are some problems with your examples:

http://stackoverflow.com?paramName= http://stackoverflow.com?paramName http://stackoverflow.com?paramName1=value1&paramName2=value2 http://stackoverflow.com?paramName2=value2&paramName1=value1 

Servers are allowed to assign meaning to the order of parameters, and to the presence of an equals sign, so those pairs are not equivalent according to RFC 3986.

http://stackoverflow.com?param name 1=value 1 http://stackoverflow.com?param%20name%201=value%201 

Not all URL libraries are going to treat these as valid, because the first is not a valid URL according to RFC 3986, although most user-agents agree on how to convert the former to the latter.

Sign up to request clarification or add additional context in comments.

2 Comments

OK thanks for the info. But by using normalize() on the URIs, all of my equality tests still fail, mostly for reasons you've given. There's the spec, and then there's reality. The reality is most servers are going to return the same thing given those "equal" URLs. That's what I want to test for but this answer (while extremely informative) doesn't help me reach that goal.
@tieTYT, I sympathize with your frustration, but there's not a whole lot of code outside browsers that has to take an almost URL and coerce it to a real URL, and browsers are unlikely to try to determine whether two URLs are likely to refer to the same resource based only on textual analysis. I know you don't want to roll your own, but unless searches for "heuristic" or "fuzzy" URL matching work, you're probably out of luck. The syntax extensions in section 2 of HTML5 2.6 might provide a good definition for an almost URL.
1

Update from 2018

There is a OkHttp Library that can compare URLs the right way.

Here are articles about it - https://medium.com/square-corner-blog/okhttps-new-url-class-515460eea661 and http://square.github.io/okhttp/

But keep in mind, that it thinks that these are different URLs:

http://stackoverflow.com https://stackoverflow.com 

and

stackoverflow.com www.stackoverflow.com 

You can do it like this:

HttpUrl url = HttpUrl.parse("http://google.com"); return url.equals(url2); 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.