2

NOTE: I am asking for real world problem, not for theoretical purpose; see the last part of the question -- the same way browsers do the job.


Usually you would see the answer:

new java.net.URL(new java.net.URL(base_url),rel_url).toString 

(base_url and rel_url are String). In my case base_url is the URL of page I fetched, rel_url comes from "<a href=..." value, so it might be even single "#" character (for example).

However such code does not work for URL fragments, like such two pieces:

htpp://www.hello.com/1.html

?p=2

I tested Firefox, Chromium, Opera, Konqueror, "Web Browser" (Gnome modesty ;-D) -- all of them combine those URLs as:

htpp://www.hello.com/1.html?p=2

With code as above I get:

htpp://www.hello.com/?p=2

Question

How do you combine URL fragments, in a ready for world manner?

I hope there is already handy library for that, before I start doing parsing by myself ;-).

10
  • 1
    @JamesBlack: What if rel_url starts with ../? Commented Nov 23, 2011 at 14:54
  • Concatenate the strings? ?p=2 isn't a URL. Commented Nov 23, 2011 at 14:54
  • @Dᴀᴠᴇ Nᴇᴡᴛᴏɴ, for web browser it is. I am asking for solution for dealing with real world, not for theory. Commented Nov 23, 2011 at 14:56
  • @macias Just to verify, what is the content of rel_url? Commented Nov 23, 2011 at 14:59
  • 3
    @macias No, Dave is correct. "?p=2" is not a URL so to speak. It can be PART of a URL though. I would also recommend concatenating the strings and then turning them into a URL. Commented Nov 23, 2011 at 14:59

1 Answer 1

7

You are misunderstanding what a URL is. ?p=2 is a query string, not a relative URL. (You may also find #foo, which is usually called a fragment identifier or reference and is most commonly used to jump to a section of a long document). The full scheme for URIs is described on Wikipedia among many other places (you can also find the differences between URIs and URLs in various places).

Anyway, relative URLs refer only to the path part of the URL--it is whether the path is absolute or relative. If you have a query string and wish to attach it to an existing URL (which does not have a query string), just append it to the string. If you don't know whether you have a query string, you can use the methods in the URL class to test for it.

If you want to replicate what browsers do, given a full URL url and a String s,

if (s.startsWith("?") || s.startsWith("#")) new java.net.URL(url.toString + s) else new java.net.URL(url, s) 

should do the trick. (I don't know the exact code that different browsers use, but this replicates the behavior that you describe of appending a query string if that is all that is provided in a href.) If you don't know whether your existing URLs might have query strings or not, then you can

if (s.startsWith("#")) new java.net.URL(url.toString.takeWhile(_ != '#') + s) else if (s.startsWith("?")) new java.net.URL(url.toString.takeWhile(_ != '?') + s) else new java.net.URL(url, s) 
Sign up to request clarification or add additional context in comments.

3 Comments

Why do you assume I don't understand something? I have "stream" of such pairs, and I have combine them. I re-title the question to avoid further off-topic comments. So your answer is simply -- no cookie, DIY. It is OK answer of course, but I hoped there is something already in existence (y2011 after all), combining those fragments is not rocket science, yet it is easy to make an error. (question updated)
@macias - I think there is no cookie in Java; I'm sure you can find some library that does it. My complaint was that you were using the wrong terminology, and then complaining that a library that uses the correct terminology is not doing what you want.
Saying is not complaining, I mean it. I am looking for the tool to do the job, that's all. Thank you for the code!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.