Pushing git submodules correctly

Question

I have the following repository layout:

Repo |-- Folder1 | |-- file1 | |-- file2 | ... |-- Folder2 | |-- file3 | |-- file4 | ... |-- file5 |-- file6 ...

Where 'Repo' is the main repository and 'Folder1' and 'Folder2' are two submodules.

Working with the root directory (file5, file6 in the example) is straight forward, but I have doubts about how to correctly push the items you modify inside the submodules.

Right now I'm editing the file inside each submodule, committing each change and pushing it localy:

[Repo]$ cd Folder1 [Folder1]$ vi file1 ... do some changes ... [Folder1]$ git add file1 [Folder1]$ git commit -m "Some changes" [Folder1]$ git push

If I check git status on the submodule directly, it outputs that it's up to date:

[Folder1]$ git status # On branch master nothing to commit, working directory clean

But if I check the status in the root folder, it tells me that the submodule that I've already pushed, has new commits.

[Folder1]$ cd .. [Repo]$ git status # On branch master # Changes not staged for commit: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: Folder1 (new commits) # no changes added to commit (use "git add" and/or "git commit -a")

Correct me if I'm wrong, but I suppose the above is due to git checking the current local repo commit ID vs the information in the git server I've pushed to. So, what I actually do now is commit all and push it up in the main repo. Is this the correct way to work with submodules?

Rafa Viotti · Accepted Answer · 2013-11-19 15:58:06Z

[Repo]$ git status # On branch master # Changes not staged for commit: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: Folder1 (new commits) # no changes added to commit (use "git add" and/or "git commit -a")

The above is due to the parent repository Repo pointing to an older commit of the Folder1 submodule.

To your other question, yes, this is the correct way to work with submodules.

Commit the submodule.
Push the submodule.
Commit in the parent repository.
Push the parent repository.

In this way you do not risk pushing a dangling submodule reference, i.e., a reference in the parent repository to a commit in the submodule that has not been pushed.

Notice that, in the parent repository, is not mandatory that you commit and push every submodule update. Other users can work in the master branch of the submodule, and do a standard git fetch / git pull to download new revisions.

I personally only commit a reference to a submodule when I'm going to release a new version of that submodule, to mark that release in the parent repository. My submodules are applications, and the parent repository is a shared library.

George Hilliard · Accepted Answer · 2013-11-19 16:00:37Z

What I actually do now is commit all and push it up in the main repo.

If by this you mean "add each submodule to the root's index and commit/push the root," then you are correct.

The reason for this is that a submodule is considered a totally separate project from the main one. Git allows you to specify an exact commit hash that the submodule is supposed to be at; this allows you to tightly control the version of the subproject that your main project uses. When you update the subproject, you must manually update the referenced commit in the main project (and push that commit).

Note, don't commit and push references to submodules' commits that you don't intend to make public! This will frustrate other developers by referencing work that they cannot access. Also, as has been pointed out, you needn't commit-and-push the submodule to the main project, every time you make a change to the submodule, only when it reaches a new stable state.

VonC · Accepted Answer · 2024-08-06 19:31:40Z

In this way you do not risk pushing a dangling submodule reference, i.e., a reference in the parent repository to a commit in the submodule that has not been pushed.

To be extra sure, you can use git fsck with Git 2.31 (Q1 2021): the approach to "fsck" the incoming objects in "index-pack" is attractive for performance reasons (we have them already in core, inflated and ready to be inspected), but fundamentally cannot be applied fully when we receive more than one pack stream, as a tree object in one pack may refer to a blob object in another pack as ".gitmodules", when we want to inspect blobs that are used as ".gitmodules" file, for example.
Teach "index-pack" to emit objects that must be inspected later and check them in the calling "fetch-pack" process.

See commit 5476e1e, commit b664e9f, commit 27e35ba, commit 726b25a (22 Feb 2021) by Jonathan Tan (jhowtan).
^{(Merged by Junio C Hamano -- gitster -- in commit 6ee353d, 01 Mar 2021)}

fetch-pack: print and use dangling .gitmodules

^{Signed-off-by: Jonathan Tan}

Teach index-pack to print dangling .gitmodules links after its "keep" or "pack" line instead of declaring an error, and teach fetch-pack to check such lines printed.

This allows the tree side of the .gitmodules link to be in one packfile and the blob side to be in another without failing the fsck check, because it is now fetch-pack which checks such objects after all packfiles have been downloaded and indexed (and not index-pack on an individual packfile, as it is before this commit).

git index-pack now includes in its man page:

For internal use only.

Die if the pack contains broken objects. If the pack contains a tree pointing to a .gitmodules blob that does not exist, prints the hash of that blob (for the caller to check) after the hash that goes into the name of the pack/idx file (see "Notes").

And this is made more robust with Git 2.32 (Q2 2021):

See commit 3745e26, commit c96e184, commit 462f5ca, commit c15087d, commit 53692df, commit 394d5d3, commit 44e07da, commit 901f2f6, commit b549502, commit c72da1a, commit 30cf618, commit 1b32b59, commit e35d65a, commit 35af754, commit 034a7b7, commit f1abc2d, commit a1aad71, commit d385784 (28 Mar 2021), and commit fb79f5b (17 Mar 2021) by Ævar Arnfjörð Bjarmason (avar).
^{(Merged by Junio C Hamano -- gitster -- in commit 5644419, 07 Apr 2021)}

fetch-pack: use new fsck API to printing dangling submodules

^{Signed-off-by: Ævar Arnfjörð Bjarmason}

Refactor the check added in 5476e1e ("fetch-pack: print and use dangling .gitmodules", 2021-02-22, Git v2.31.0-rc1 -- merge) to make use of us now passing the "msg_id" to the user defined "error_func".
We can now compare against the FSCK_MSG_GITMODULES_MISSING instead of parsing the generated message.

"git fetch-pack -k -k"^(man) without passing --lock-pack (which we never do ourselves) did not work at all, which has been corrected with Git 2.46 (Q3 2024), batch 17.

See commit 96a6621 (19 Jun 2024) by Jeff King (peff).
^{(Merged by Junio C Hamano -- gitster -- in commit b781a3e, 27 Jun 2024)}

fetch-pack: fix segfault when fscking without --lock-pack

^{Reported-by: Kirill Smelkov}
^{Signed-off-by: Jeff King}

The fetch-pack internals have multiple options related to creating ".keep" lock-files for the received pack:

if args.lock_pack is set, then we tell index-pack to create a .keep file.
In the fetch-pack plumbing command, this is triggered by passing "-k" twice.

if the caller passes in a pack_lockfiles string list, then we use it to record the path of the keep-file created by index-pack.
We get that name by reading the stdout of index-pack.
In the fetch-pack command, this is triggered by passing the (undocumented) --lock-pack option; without it, we pass in a NULL string list.

So it's possible to ask index-pack to create the lock-file (using "-k -k") but not ask to record it (by avoiding "--lock-pack").
This worked fine until 5476e1e ("fetch-pack: print and use dangling .gitmodules", 2021-02-22, Git v2.31.0-rc1 -- merge), but now it causes a segfault.

Before that commit, if pack_lockfiles was NULL, we wouldn't bother reading the output from index-pack at all.
But since that commit, index-pack may produce extra output if we asked it to fsck.
So even if nobody cares about the lockfile path, we still need to read it to skip to the output we do care about.

We correctly check that we didn't get a NULL lockfile path (which can happen if we did not ask it to create a .keep file at all), but we missed the case where the lockfile path is not NULL (due to "-k -k") but the pack_lockfiles string_list is NULL (because nobody passed "--lock-pack"), and segfault trying to add to the NULL string-list.

We can fix this by skipping the append to the string list when either the value or the list is NULL.
In that case we must also free the lockfile path to avoid leaking it when it's non-NULL.

Nobody noticed the bug for so long because the transport code used by "git fetch"^(man) always passes in a pack_lockfiles pointer, and remote-curl (the main user of the fetch-pack plumbing command) always passes --lock-pack.

Collectives™ on Stack Overflow

Pushing git submodules correctly

3 Answers 3

Comments

Comments

`fetch-pack`: print and use dangling .gitmodules

`fetch-pack`: use new fsck API to printing dangling submodules

`fetch-pack`: fix segfault when fscking without `--lock-pack`

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

fetch-pack: print and use dangling .gitmodules

fetch-pack: use new fsck API to printing dangling submodules

fetch-pack: fix segfault when fscking without --lock-pack

Comments

Related

`fetch-pack`: print and use dangling .gitmodules

`fetch-pack`: use new fsck API to printing dangling submodules

`fetch-pack`: fix segfault when fscking without `--lock-pack`