166

This question is based on Detach subdirectory into separate Git repository

Instead of detaching a single subdirectory, I want to detach a couple. For example, my current directory tree looks like this:

/apps /AAA /BBB /CCC /libs /XXX /YYY /ZZZ 

And I would like this instead:

/apps /AAA /libs /XXX 

The --subdirectory-filter argument to git filter-branch won't work because it gets rid of everything except for the given directory the first time it's run. I thought using the --index-filter argument for all unwanted files would work (albeit tedious), but if I try running it more than once, I get the following message:

Cannot create a new backup. A previous backup already exists in refs/original/ Force overwriting the backup with -f 

Any ideas? TIA

11 Answers 11

178

Instead of having to deal with a subshell and using ext glob (as kynan suggested), try this much simpler approach:

git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- apps/AAA libs/XXX' --prune-empty -- --all 

As mentioned by void.pointer's comment, this will remove everything except apps/AAA and libs/XXX from current repository.

Prune empty merge commits

This leaves behind lots of empty merges. These can be removed by another pass as described by raphinesse in his answer:

git filter-branch --prune-empty --parent-filter \ 'sed "s/-p //g" | xargs -r git show-branch --independent | sed "s/\</-p /g"' 

⚠️ Warning: The above must use GNU version of sed and xargs otherwise it would remove all commits as xargs fails. brew install gnu-sed findutils and then use gsed and gxargs:

git filter-branch --prune-empty --parent-filter \ 'gsed "s/-p //g" | gxargs git show-branch --independent | gsed "s/\</-p /g"' 
Sign up to request clarification or add additional context in comments.

19 Comments

additionally, the --ignore-unmatch flag should be passed to git rm, it failed for the very first commit for me otherwise (the repository was created with git svn clone in my case)
Assuming you have tags in the mix, you should probably add --tag-name-filter cat to your parameters
Could you add some more information explaining what this lengthy command is doing?
I'm pleasantly surprised that this works perfectly on Windows using git bash, phew!
@BurhanAli For every commit in history, it is deleting all files except the ones you want to keep. When everything is done, you are left with only the portion of the tree you specified, along with only that history.
|
62

An easy solution: git-filter-repo

I had a similar issue and, after reviewing the various approaches listed here, I discovered git-filter-repo. It is recommended as an alternative to git-filter-branch in the official git documentation here.

To create a new repository from a subset of directories in an existing repository, you can use the command:

git filter-repo --path <file_to_keep> 

Filter multiple files/folders by chaining them:

git filter-repo --path keepthisfile --path keepthisfolder/ 

So, to answer the original question, with git-filter-repo you would just need the following command:

git filter-repo --path apps/AAA/ --path libs/XXX/ 

1 Comment

This is definitely a great answer. The problem with all other solutions is that I couldn't manage to extract the contents of ALL branches of a directory. However, git filter-repo retrieved the folder from all branches and rewrote history perfectly, like cleaning the whole tree of everything I didn't need.
42

Manual steps with simple git commands

The plan is to split individual directories into its own repos, then merge them together. The following manual steps did not employ geek-to-use scripts but easy-to-understand commands and could help merge extra N sub-folders into another single repository.

Divide

Let's assume your original repo is: original_repo

1 - Split apps:

git clone original_repo apps-repo cd apps-repo git filter-branch --prune-empty --subdirectory-filter apps master 

2 - Split libs

git clone original_repo libs-repo cd libs-repo git filter-branch --prune-empty --subdirectory-filter libs master 

Continue if you have more than 2 folders. Now you shall have two new and temporary git repository.

Conquer by Merging apps and libs

3 - Prepare the brand new repo:

mkdir my-desired-repo cd my-desired-repo git init 

And you will need to make at least one commit. If the following three lines should be skipped, your first repo will appear immediate under your repo's root:

touch a_file_and_make_a_commit # see user's feedback git add a_file_and_make_a_commit git commit -am "at least one commit is needed for it to work" 

With the temp file commited, merge command in later section will stop as expected.

Taking from user's feedback, instead of adding a random file like a_file_and_make_a_commit, you can choose to add a .gitignore, or README.md etc.

4 - Merge apps repo first:

git remote add apps-repo ../apps-repo git fetch apps-repo git merge -s ours --no-commit apps-repo/master # see below note. git read-tree --prefix=apps -u apps-repo/master git commit -m "import apps" 

Now you should see apps directory inside your new repository. git log should show all relevant historical commit messages.

Note: as Chris noted below in the comments, for newer version(>=2.9) of git, you need to specify --allow-unrelated-histories with git merge

5 - Merge libs repo next in the same way:

git remote add libs-repo ../libs-repo git fetch libs-repo git merge -s ours --no-commit libs-repo/master # see above note. git read-tree --prefix=libs -u libs-repo/master git commit -m "import libs" 

Continue if you have more than 2 repos to merge.

Reference: Merge a subdirectory of another repository with git

10 Comments

Since git 2.9 you need to use --allow-unrelated-histories on the merge commands. Otherwise this appears to have work well for me.
Genius! Thank you so much for this. The initial answers I'd looked at, using a tree filter on a very large repository, had git predicting taking over 26hrs to complete the git rewrites. Much happier with this simple, but repeatable approach and have successfully moved 4 sub folders into a new repo with all expected commit history.
You can use the first commit for a "Initial commit" which adds .gitignore and README.md files.
Unfortunately this approach seems to break tracking-history for the files added in the git merge .. git read-tree step, as it records them as newly-added files and all of my git guis don't make the connection to their earlier commits.
@ksadjad, No idea, to be honest. The central point of the manual merge is to select the directories to form the new repo and keep their commit histories. I am not sure how to handle such situation where a commit put files into dirA, dirB, dirDrop and only dirA and dirB are chosen for the new repo, how should the commit history relate to the original one.
|
28

Why would you want to run filter-branch more than once? You can do it all in one sweep, so no need to force it (note that you need extglob enabled in your shell for this to work):

git filter-branch --index-filter "git rm -r -f --cached --ignore-unmatch $(ls -xd apps/!(AAA) libs/!(XXX))" --prune-empty -- --all 

This should get rid of all the changes in the unwanted subdirectories and keep all your branches and commits (unless they only affect files in the pruned subdirectories, by virtue of --prune-empty) - no issue with duplicate commits etc.

After this operation the unwanted directories will be listed as untracked by git status.

The $(ls ...) is necessary s.t. the extglob is evaluated by your shell instead of the index filter, which uses the sh builtin eval (where extglob is not available). See How do I enable shell options in git? for further details on that.

12 Comments

Interesting idea. I have a similar problem but could not get it to work, see stackoverflow.com/questions/8050687/…
This is pretty much what I needed, though I had sprinkling of both files and folders across my repo... Thanks :)
hm. even with extglob turned on I'm getting an error near my parenthesis: syntax error near unexpected token `(' my command looks like: git filter-branch -f --index-filter "git rm -r -f --cached --ignore-unmatch src/css/themes/!(some_theme*)" --prune-empty -- --all an ls with src/css/themes/!(some_theme*) returns all the other themes so extglob does appear to be working...
@MikeGraf I don't think that will give the desired result: escaping would match a literal "!" etc. in your path.
@david-smiley’s (more recent) answer uses a very similar approach, but has the advantage of relying exclusively on git commands, and thus isn’t as susceptible to differences in how the ls is interpreted across operating systems, as @Bae discovered.
|
21

Answering my own question here... after a lot of trial and error.

I managed to do this using a combination of git subtree and git-stitch-repo. These instructions are based on:

First, I pulled out the directories I wanted to keep into their own separate repository:

cd origRepo git subtree split -P apps/AAA -b aaa git subtree split -P libs/XXX -b xxx cd .. mkdir aaaRepo cd aaaRepo git init git fetch ../origRepo aaa git checkout -b master FETCH_HEAD cd .. mkdir xxxRepo cd xxxRepo git init git fetch ../origRepo xxx git checkout -b master FETCH_HEAD 

I then created a new empty repository, and imported/stitched the last two into it:

cd .. mkdir newRepo cd newRepo git init git-stitch-repo ../aaaRepo:apps/AAA ../xxxRepo:libs/XXX | git fast-import 

This creates two branches, master-A and master-B, each holding the content of one of the stitched repos. To combine them and clean up:

git checkout master-A git pull . master-B git checkout master git branch -d master-A git branch -d master-B 

Now I'm not quite sure how/when this happens, but after the first checkout and the pull, the code magically merges into the master branch (any insight on what's going on here is appreciated!)

Everything seems to have worked as expected, except that if I look through the newRepo commit history, there are duplicates when the changeset affected both apps/AAA and libs/XXX. If there is a way to remove duplicates, then it would be perfect.

4 Comments

Neat tools you found here. Insight on "checkout": "git pull" is the same as "git fetch && git merge". The "fetch" part is innocuous since you are "fetching locally". So I think this checkout command is the same as "git merge master-B", which is a bit more self-evident. See kernel.org/pub/software/scm/git/docs/git-pull.html
Unfortunately the git-stitch-repo tool is broken due to bad dependencies nowadays.
@Henrik What problem were you experiencing exactly? It works for me, although I had to add export PERL5LIB="$PERL5LIB:/usr/local/git/lib/perl5/site_perl/" to my bash config so that it could find Git.pm. Then I installed it with cpan.
It's possible to use git subtree add to perform this task. See stackoverflow.com/a/58253979/1894803
7

I have writen a git filter to solve exactly this problem. It has the fantastic name of git_filter and is located at github here:

https://github.com/slobobaby/git_filter

It is based on the excellent libgit2.

I needed to split a large repository with many commits (~100000) and the solutions based on git filter-branch took several days to run. git_filter takes a minute to do the same thing.

Comments

7

Use 'git splits' git extension

git splits is a bash script that is a wrapper around git branch-filter that I created as a git extension, based on jkeating's solution.

It was made exactly for this situation. For your error, try using the git splits -f option to force removal of the backup. Because git splits operates on a new branch, it won't rewrite your current branch, so the backup is extraneous. See the readme for more detail and be sure to use it on a copy/clone of your repo ( just in case!).

  1. install git splits.
  2. Split the directories into a local branch #change into your repo's directory cd /path/to/repo #checkout the branch git checkout XYZ
    #split multiple directories into new branch XYZ git splits -b XYZ apps/AAA libs/ZZZ

  3. Create an empty repo somewhere. We'll assume we've created an empty repo called xyz on GitHub that has path : [email protected]:simpliwp/xyz.git

  4. Push to the new repo. #add a new remote origin for the empty repo so we can push to the empty repo on GitHub git remote add origin_xyz [email protected]:simpliwp/xyz.git #push the branch to the empty repo's master branch git push origin_xyz XYZ:master

  5. Clone the newly created remote repo into a new local directory
    #change current directory out of the old repo cd /path/to/where/you/want/the/new/local/repo #clone the remote repo you just pushed to git clone [email protected]:simpliwp/xyz.git

3 Comments

It does not seem to be possible to add files to the split and update them later, right?
This seems to slow to run on my repo with tons of commits
git-split seems to use git --index filter which is extremely slow compared to --subdirectory-filter. For some repos it may still be a viable option, but for big repos (multiple gigabytes, 6-digit commits) --index-filter effectively takes weeks to run, even on dedicated cloud hardware.
6
git clone [email protected]:thing.git cd thing git fetch for originBranch in `git branch -r | grep -v master`; do branch=${originBranch:7:${#originBranch}} git checkout $branch done git checkout master git filter-branch --index-filter 'git rm --cached -qr --ignore-unmatch -- . && git reset -q $GIT_COMMIT -- dir1 dir2 .gitignore' --prune-empty -- --all git remote set-url origin [email protected]:newthing.git git push --all 

2 Comments

Reading through all the other comments got me on the right track. However, your solution just works. It imports all branches, and works with multiple directories! Great!
The for loop is worth acknowledging, since other similar answers don’t include it. If you don’t have a local copy of each branch in your clone, then filter-branch won’t account for them as part of its rewrite, which could potentially exclude files introduced in other branches, but not yet merged with your current branch. (Though it‘s also worth doing a git fetch on any branches you have previously checked out to ensure that they remain current.)
3

Yeah. Force overwriting the backup by using the -f flag on subsequent calls to filter-branch to override that warning. :) Otherwise I think you have the solution (that is, eradicate an unwanted directory at a time with filter-branch).

Comments

1

I think it is easier to just export the commits that touch those paths:

git log --pretty=email --patch-with-stat --reverse --full-index --binary -- /apps/{AAA,BBB,CCC} /libs/{XXX,YYY,ZZZ} > subdir.patch 

and then import those commits into a new repo:

git am < subdir.patch 

If you have merge commits that cannot be rebased, you may want to try with -m --first-parent:

git log --pretty=email --patch-with-stat --reverse --full-index --binary -m --first-parent -- <your paths> 

1 Comment

What I like about this method: it's quick, easy, doesn't require a bunch of commands that I have to look up to make sure it works for my case, and, the best thing: I can edit the patch file so that the paths are what I want them to be if I only want to keep a few files or directories. Doing it this way preserves the full history without the dreaded "oh, you moved the file, I don't know what the history is any more" issue that so many other methods have.
-6

Delete the backup present under the .git directory in refs/original like the message suggests. The directory is hidden.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.