Return to Answer

added 473 characters in body

edited Jan 15, 2023 at 5:16

291.8k
23
260
438

It is exactly because that these characters are reserved, that Java's API does not encode them. Being reserved means that they have special meaning when they are not escaped:

from the same section of the RFC you linked:

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications. Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.

If java.net.URI always escaped them, then you would not be able to express whatever special meaning the reserved characters have. You would be only able to create

http://username:[email protected]:80/path/t%2B/resource?q=search+term#section1

but not

http://username:[email protected]:80/path/t+/resource?q=search+term#section1

which can be URIs that mean different things, according to the RFC.

AnywayFurther down that section, let's look atit is also said that

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.

In other words, if "these characters are specifically allowed by the URI scheme to represent data in that component", then "URI producing applications should NOT percent-encode data octets...". This is very much the case in the path component specifically. Its syntax, which uses only a subset of the reserved characters. - /, @, :, and everything in "sub-delims".

This matches what the JavaDoc says about what it doesn't escape. Note that the wording in the JavaDoc (words like "escaped" and "punct") is actually from an older RFC, RFC 2396. With a bit of careful checking, you can see that they are indeed equivalent in this regard.

It is exactly because that these characters are reserved, that Java's API does not encode them. Being reserved means that they have special meaning when they are not escaped:

from the same section of the RFC you linked:

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications. Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.

If java.net.URI always escaped them, then you would not be able to express whatever special meaning the reserved characters have. You would be only able to create

http://username:[email protected]:80/path/t%2B/resource?q=search+term#section1

but not

http://username:[email protected]:80/path/t+/resource?q=search+term#section1

which can be URIs that mean different things, according to the RFC.

Anyway, let's look at the path component specifically. Its syntax uses only a subset of the reserved characters. /, @, :, and everything in "sub-delims". This matches what the JavaDoc says about what it doesn't escape. Note that the wording in the JavaDoc (words like "escaped" and "punct") is actually from an older RFC, RFC 2396. With a bit of careful checking, you can see that they are indeed equivalent in this regard.

It is exactly because that these characters are reserved, that Java's API does not encode them. Being reserved means that they have special meaning when they are not escaped:

from the same section of the RFC you linked:

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications. Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.

If java.net.URI always escaped them, then you would not be able to express whatever special meaning the reserved characters have. You would be only able to create

http://username:[email protected]:80/path/t%2B/resource?q=search+term#section1

but not

http://username:[email protected]:80/path/t+/resource?q=search+term#section1

which can be URIs that mean different things, according to the RFC.

Further down that section, it is also said that

URI producing applications should percent-encode data octets that
correspond to characters in the reserved set unless these characters
are specifically allowed by the URI scheme to represent data in that
component.

In other words, if "these characters are specifically allowed by the URI scheme to represent data in that component", then "URI producing applications should NOT percent-encode data octets...". This is very much the case in the path component, which uses a subset of the reserved characters - /, @, :, and everything in "sub-delims".

Source Link

answered Jan 15, 2023 at 5:07

Sweeper

291.8k
23
260
438

It is exactly because that these characters are reserved, that Java's API does not encode them. Being reserved means that they have special meaning when they are not escaped:

from the same section of the RFC you linked:

The purpose of reserved characters is to provide a set of delimiting
characters that are distinguishable from other data within a URI.
URIs that differ in the replacement of a reserved character with its
corresponding percent-encoded octet are not equivalent. Percent-
encoding a reserved character, or decoding a percent-encoded octet
that corresponds to a reserved character, will change how the URI is
interpreted by most applications. Thus, characters in the reserved
set are protected from normalization and are therefore safe to be
used by scheme-specific and producer-specific algorithms for
delimiting data subcomponents within a URI.

If java.net.URI always escaped them, then you would not be able to express whatever special meaning the reserved characters have. You would be only able to create

http://username:[email protected]:80/path/t%2B/resource?q=search+term#section1

but not

http://username:[email protected]:80/path/t+/resource?q=search+term#section1

which can be URIs that mean different things, according to the RFC.

Collectives™ on Stack Overflow

Return to Answer