0

This is a really quick question: what is the character encoding used in symbolic ref files like .git/HEAD, especially on Windows?

Is it the same as the filesystem's encoding? It sounds improbable, though, since I've heard before that Windows' filesystem encoding is UTF-16 and ASCII control bytes 0x00..0x1F and 0x7F is prohibited in Git ref name (we can't have a byte 0x00 in Git ref). Is it UTF-8 universally? However it does not seem to be documented in git help check-ref-format. Maybe it lies somewhere else? Or is symbolic ref's encoding undefined? However then, how can we clone, push and fetch branches between each other?

8
  • I'm sure it's ASCII sans ASCII control characters. Commented Sep 14, 2021 at 9:09
  • 1
    You are not supposed to open and read these files directly. You are supposed to run git symbolic-ref instead. Use whatever encoding git symbolic-ref uses as its input and output, and don't worry about what encoding might appear in a Git internal file, because that encoding might change tomorrow. Commented Sep 14, 2021 at 9:19
  • 1
    Note that push and fetch cannot handle symbolic refs in general. There is one special case for HEAD, handled by git clone and git remote set-head, in an undocumented fashion (it's full of historical oddities, nobody wants to describe them in anything official :-) ). Don't try to transfer symbolic refs from one repository to another; it doesn't work in general. Commented Sep 14, 2021 at 9:21
  • 1
    Side note: when references are packed there is no "ref file". Use git symbolic-ref to poke around and git update-ref to mess with references. Commented Sep 14, 2021 at 10:58
  • 1
    Don't trust it to be a path name; as @phd notes, sometimes it isn't. The upcoming ref-table project will remove all of these files, except perhaps for HEAD itself, in favor of a real (if somewhat oddly encoded) refs database. Commented Sep 14, 2021 at 12:25

1 Answer 1

2

There is no specific character encoding used by Git's refs. The format is specified in the git check-ref-format manual page, and it allows a variety of byte values, including values which are not value UTF-8, such as 0xFE and 0xFF.

However, having said that, it is customary to use UTF-8 for ref names, and when ref files are written into the file system on Windows, they will be converted into UTF-16 because Windows can't handle anything else in its file system. The contents of the files, however, remains something containing arbitrary bytes, which, again, are customarily (but need not be) UTF-8.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.