96

w3fools claims that URLs can contain spaces: http://w3fools.com/#html_urlencode

Is this true? How can a URL contain an un-encoded space?

I'm under the impression the request line of an HTTP Request uses a space as a delimiter, being formatted as {the method}{space}{the path}{space}{the protocol}:

GET /index.html http/1.1 

Therefore how can a URL contain a space? If it can, where did the practice of replacing spaces with + come from?

0

4 Answers 4

142

A URL must not contain a literal space. It must either be encoded using the percent-encoding or a different encoding that uses URL-safe characters (like application/x-www-form-urlencoded that uses + instead of %20 for spaces).

But whether the statement is right or wrong depends on the interpretation: Syntactically, a URI must not contain a literal space and it must be encoded; semantically, a %20 is not a space (obviously) but it represents a space.

Sign up to request clarification or add additional context in comments.

4 Comments

So... is their criticism inaccurate?
@Richard JP Le Guen: That depends on how you interpret it: Syntactically, a URI must not contain a literal space and it must be encoded; semantically, a %20 is not a space (obviously) but it represents a space.
Ya, that's the best interpretation I can come up with, too.
And +1000000 for citing a source. This question wasn't about technology but rather about credibility and misinformation, yet it look all of 2 minutes to have 3 other unjustified, unreferenced and unproven answers which could just as easily be personal opinions. Thank you.
22

They are indeed fools. If you look at RFC 3986 Appendix A, you will see that "space" is simply not mentioned anywhere in the grammar for defining a URL. Since it's not mentioned anywhere in the grammar, the only way to encode a space is with percent-encoding (%20).

In fact, the RFC even states that spaces are delimiters and should be ignored:

In some cases, extra whitespace (spaces, line-breaks, tabs, etc.) may have to be added to break a long URI across lines. The whitespace should be ignored when the URI is extracted.

and

For robustness, software that accepts user-typed URI should attempt to recognize and strip both delimiters and embedded whitespace.

Curiously, the use of + as an encoding for space isn't mentioned in the RFC, although it is reserved as a sub-delimeter. I suspect that its use is either just convention or covered by a different RFC (possibly HTTP).

2 Comments

The character + is not translated into a space (or vice versa) by any part of the HTTP request process in the general case. It is, however, translated into a space when encountered as the value of a parameter in an "application/x-www-form-urlencoded" query string, and often preferred by browser software over %20, for the sake of brevity, when such query strings are appended to request URIs. Of course, the HTTP server may also choose to treat + as equivalent to space within URI paths, but that's not specified by the standard.
However! The same standard, on the same page, also mentions: "Using <> angle brackets around each URI is especially recommended as a delimiting style for a reference that contains embedded whitespace." So how about that?
16

Spaces are simply replaced by "%20" like :

http://www.example.com/my%20beautiful%20page

3 Comments

Edited the question to specify un-encoded space.
Clicking on the link gives a 400 page. I think you're missing a 20 after your second %.
I tried this with a DELETE curl API and it worked. Separating strings by + however did not.
4

The information there is I think partially correct:

That's not true. An URL can use spaces. Nothing defines that a space is replaced with a + sign.

As you noted, an URL can NOT use spaces. The HTTP request would get screwed over. I'm not sure where the + is defined, though %20 is standard.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.