96

I'm using the fragment identifier to create a permalink for AJAX events in my web app similar to this guy. Something like:

http://www.myapp.com/calendar#filter:year/2010/month/5 

I've done quite a bit of searching but can't find a list of valid characters for the fragment idenitifer. The W3C spec doesn't offer anything.

Do I need to encode the characters the same as the URL in has in general?

There doesn't seem to be any good information on this anywhere.

3 Answers 3

114

See the RFC 3986.

fragment = *( pchar / "/" / "?" ) pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" pct-encoded = "%" HEXDIG HEXDIG sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" 

So you can use !, $, &, ', (, ), *, +, ,, ;, =, something matching %[0-9a-fA-F]{2}, something matching [a-zA-Z0-9], -, ., _, ~, :, @, /, and ?

Sign up to request clarification or add additional context in comments.

7 Comments

@Artefacto, So does it mean that a "%" is not allowed everywhere, but only allowed when two valid characters follow it?
@Pacerier yes, % is only allowed as an escape character. Use %25 to encode a single %.
The back / forward button doesn't work with fragment identifiers that have a colon in spite of the RFC stating that its a valid character.
Wow! Would be probably easier to tell what ascii characters cannot be used!
In case anyone wants a quick and dirty sanitizer like I did: myFragment.replace(/(?=((?:[\!\$&'\(\)\*\+,;=a-zA-Z0-9\-._~:@\/?]|%[0-9a-fA-F]{2})*))\1./g, "$1-"); Replace the - in the "$1-" with the desired placeholder char
|
32

https://www.rfc-editor.org/rfc/rfc3986#section-3.5:

fragment = *( pchar / "/" / "?" ) 

and

pchar = unreserved / pct-encoded / sub-delims / ":" / "@" unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" pct-encoded = "%" HEXDIG HEXDIG 

So, combined, the fragment cannot contain #, a raw %, ^, [, ], {, }, \, ", < and > according to the RFC.

3 Comments

Thanks. Gave the answer to Artefacto since he was a hair faster but gave you +1 for the response.
I suppose you're missing non-printable ASCII characters and non-ascii characters.
Seems that you forgot VERTICAL BAR (|) and GRAVE ACCENT (`) and SPACE ( ) in the not-list. So the full list of printable (7-bit) US-ASCII characters in the not-list is: "#%< >[\]^`{|}
2

One other RFC speak of that: RFC-1738

URL schemeparts for ip based protocols: HTTP httpurl = "http://" hostport [ "/" hpath [ "?" search ]] hpath = hsegment *[ "/" hsegment ] hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ] search = *[ uchar | ";" | ":" | "@" | "&" | "=" ] 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.