My requirements are fairly simple, but I need to do a lot of this so I'm looking for a robust solution.
Is there a good light-weight library for decomposing URLs into their component parts in Java? I'm referring to hostname, query string, etc.
I am always forgetting the URI format, so here it is:
<scheme>://<userinfo>@<host>:<port><path>#<fragement> And here an example:
URI uri = new URI ("query://[email protected]:9000/public/manuals/appliances?stove#ge"); The following will happen:
uri.getAuthority() will return "[email protected]:9000"uri.getFragment () will return "ge"uri.getHost () will return "books.com"uri.getPath () will return "/public/manuals/appliances"uri.getPort () will return 9000uri.getQuery () will return "stove"uri.getScheme () will return "query"uri.getSchemeSpecificPart () will return "//[email protected]:9000/public/manuals/appliances?stove"uri.getUserInfo () will return "jeff"uri.isAbsolute () will return trueuri.isOpaque () will return falseI found this blog handy: Exploring Java's Network API: URIs and URLs
java.net.URI and java.net.URL do not work for many modern URLs. java.net.URI adheres to RFC 2396, which a really old standard. java.net.URL sometimes does a good job, but if you're working with URLs as found in the wild, it will fail for many cases.
In order to solve these issues, I wrote galimatias, a URL parsing and normalization library for Java. It will work with almost any URL you can imagine (basically, if it works in a web browser, galimatias will parse it correctly). And it has very convenient API.
You can get it at: https://github.com/smola/galimatias
Take a look at java.net.URL. It has methods for exactly what you're trying to do.
Hostname: getHost()
Query string: getQuery()
Fragment/ref/anchor: getRef()
Path: getPath()