3

I can't find anything in the documentation. If I do a git pull, am I guaranteed that the underlying file, resulting of the merge, is atomically written?

Some more context about what I am trying to achieve: I have some scripts that periodically do a git pull and I need to know if I can rely on the state of the files being valid during a pull.

We are basically using git as a deployment tool. We never have merge conflicts by design. On the remote end, a job constantly pulls every x seconds, and other jobs read the files. What could happen is that we open a file while it's being pulled by git, and the contents of the file are not what we are expecting. This is unless git is smart enough to use some atomic swap on the underlying OS (RedHat in this case)

2
  • First of all, what does "valid" mean in that context? It's hard for us to know, since we don't know what you're trying to accomplish. I strongly suspect that if you need to know the state of the file during a pull, you're trying to accomplish something that refuses to let git just be a source control tool... so what "valid" means is far from obvious and very important to the question. I can tell you that in the event of a merge conflict, there will come a point where the file stops being "valid code", whether it's atomically written or not. Commented Mar 19, 2018 at 12:11
  • you're right, I added some context. Hope it helps Commented Mar 19, 2018 at 12:26

2 Answers 2

7

The short answer is no.

It's worth considering that git pull isn't about files at all, it's about commits. Files are just a side effect. :-) The pull operation is just git fetch (obtain commits) followed by a second Git command, usually git merge. The merge step merges commits. This has a side effect of merging files as well, if the operation is not a fast-forward instead of a merge; and then when the merge or fast-forward is complete, Git does a git checkout of the resulting commit.

So this really boils down to: Is git checkout atomic at the OS level? The answer is a very loud no: it's not atomic in any way. Individual files written in the work-tree are written one at a time, using OS-level write calls, which are not atomic. Files that need to be created or deleted are done one at a time. Git does use the index, which indexes (i.e., keeps tabs on) the work-tree, to minimize the number of files removed, created, or rewritten-in-place. Git also locks against other Git operations, and makes the Git-level transaction appear atomic—but anything working outside Git, that does not cooperate with Git's locking system, will be able to see the changes as they occur.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks. That's what I was expecting. It could be nice if there was an option to force an atomic swap on the files, but that's probably not what git is designed for
Note that you can build your own atomicity using separate work-trees (with separate index files) and a rename operation, assuming you have an atomic rename. The idea here is to check out the new work-tree (using a new empty index) into an empty temporary directory that can live in the same place as the current tree. You then rename the "in use" tree out of the way, and rename the new tree into place. There is a brief window when there is no tree at all, but that's as small as you can get unless your OS offers a "swap names" operation.
1

On the git checkout part of git pull, see torek's answer.

On the git fetch part of git pull, there is an --atomic flag, Git 2.36 (Q2 2022) clarifies it.

"git fetch"(man) can make two separate fetches, but ref updates coming from them were in two separate ref transactions under "--atomic", which has been corrected with Git 2.36 (Q2 2022).

See commit 583bc41, commit b3a8046, commit 4f2ba2d, commit 62091b4, commit 2983cec, commit efbade0, commit 2a0cafd (17 Feb 2022) by Patrick Steinhardt (pks-t).
(Merged by Junio C Hamano -- gitster -- in commit 851d2f0, 13 Mar 2022)

fetch: increase test coverage of fetches

Signed-off-by: Patrick Steinhardt

When using git fetch with the --atomic flag, the expectation is that either all of the references are updated, or alternatively none are in case the fetch fails.

While we already have tests for this, we do not have any tests which exercise atomicity either when pruning deleted refs or when backfilling tags.
This gap in test coverage hides that we indeed don't handle atomicity correctly for both of these cases.

Add test cases which cover these testing gaps to demonstrate the broken behaviour.


Warning:

With Git 2.36 (Q2 2022), revert the "deletion of a ref should not trigger transaction events for loose and packed ref backends separately" that regresses the behaviour when a ref is not modified since it was packed.

See commit 4315986, commit 347cc1b, commit c6da34a (13 Apr 2022) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 4027e30, 14 Apr 2022)

4027e30c53:Merge branch 'jc/revert-ref-transaction-hook-changes'

Revert "fetch: increase test coverage of fetches"
Revert "Merge branch 'ps/avoid-unnecessary-hook-invocation-with-packed-refs'"


"git fetch --atomic"(man) issued an unnecessary empty error message, which has been corrected with Git 2.44 (Q1 2024).

See commit 18ce489, commit 97d82b2 (17 Dec 2023) by Jiang Xin (jiangxin).
(Merged by Junio C Hamano -- gitster -- in commit deb67d1, 27 Dec 2023)

fetch: no redundant error message for atomic fetch

Helped-by: Patrick Steinhardt
Signed-off-by: Jiang Xin
Acked-by: Patrick Steinhardt

If an error occurs during an atomic fetch, a redundant error message will appear at the end of do_fetch().
It was introduced in b3a8046 ("fetch: make --atomic flag cover backfilling of tags", 2022-02-17, Git v2.36.0-rc0 -- merge listed in batch #11).

Because a failure message is displayed before setting retcode in the function do_fetch(), calling error() on the err message at the end of this function may result in redundant or empty error message to be displayed.

We can remove the redundant error() function, because we know that the function ref_transaction_abort() never fails.
While we can find a common pattern for calling ref_transaction_abort() by running command "git grep"(man) -A1 ref_transaction_abort", e.g.:

if (ref_transaction_abort(transaction, &error)) error("abort: %s", error.buf); 

Following this pattern, we can tolerate the return value of the function ref_transaction_abort() being changed in the future.
We also delay the output of the err message to the end of do_fetch() to reduce redundant code.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.