I want to create an instance of java.net.URI using individual URI components, namely:
- scheme
- userInfo
- host
- port
- path
- query
- fragment
There is a constructor in java.net.URI class that allows me to do it, here is a code from the library:
public URI(String scheme, String authority, String path, String query, String fragment) throws URISyntaxException { String s = toString(scheme, null, authority, null, null, -1, path, query, fragment); checkPath(s, scheme, path); new Parser(s).parse(false); } This constructor will also encode path, query, and fragment parts of the URI, so for example if I pass already encoded strings as arguments, they will be double encoded.
JavaDoc on this function states:
- If a path is given then it is appended. Any character not in the unreserved, punct, escaped, or other categories, and not equal to the slash character ('/') or the commercial-at character ('@'), is quoted.
- If a query is given then a question-mark character ('?') is appended, followed by the query. Any character that is not a legal URI character is quoted.
- Finally, if a fragment is given then a hash character ('#') is appended, followed by the fragment. Any character that is not a legal URI character is quoted.
it states that unreserved punct and escaped characters are NOT quoted, punct characters include:
!#$&'()*+,;=:
According to RFC 3986 reserved characters are:
reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" So, if characters @, / and + are reserved, and should always be encoded (or I'm I missing something?), according to the most up to date RFC on URIs, then why does java.net.URI JavaDoc states that it will not encode punct characters (which includes + and =), @ and /?
Here is a little example I ran:
String scheme = "http"; String userInfo = "username:password"; String host = "example.com"; int port = 80; String path = "/path/t+/resource"; String query = "q=search+term"; String fragment = "section1"; URI uri = new URI(scheme, userInfo, host, port, path, query, fragment); uri.toString // will not encode `+` in path. I don't understand, if this is correct behavior and those characters indeed don't need to be encoded, then why are they referred to as "reserved" in an RFC? I'm trying to implement a function that will take a whole URI string and encode it (hence extract path, query, and fragment, encode reserved characters in them, and put the URI back together).