9

I'm migrating a repository from svn to git.

In this last step, I want to remove tons of files that aren't needed from the history.

I'm trying the following command:

git filter-branch --prune-empty --index-filter \ "for file in $(cat files); do git rm -rf --cached --ignore-unmatch ${file}; done" -f 

But it says that the argument list is too long.

I could rewrite this like:

for file in $(cat files); do git filter-branch --prune-empty --index-filter \ "git rm -rf --cached --ignore-unmatch ${file}" -f done 

But it will run filter-branch tons of times, and the history is long.. so, it would take too much time.

Is there a faster way to filter-branch removing lots of files?

6
  • can you consider spliting git during svn 2 git; I am basically asking for repository refactoring Commented Aug 1, 2013 at 12:16
  • possible duplicate of New repo with copied history of only currently tracked files Commented Aug 1, 2013 at 12:21
  • I did that. But the repo still too big. My coworkers used to commit binaries to SVN, like jBoss, JDK and other things... a real mess.. Commented Aug 1, 2013 at 12:22
  • @caarlos0 Did you read the answers in there about ways to use filter-branch to remove a lot of files? Have you tried them? (There's more than one method). Which ones did you try? Did you see any error messages or other indications of why they might have failed? Commented Aug 1, 2013 at 12:25
  • I've tried several ways... none worked, got errors like "file not found" and weird syntax errors... anyway... perhaps I will just wait my for end.. Commented Aug 1, 2013 at 13:04

1 Answer 1

7

I'd recommend using The BFG, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.

You mentioned in your comment that the problem files are generally big binaries, and The BFG has a specific option for handling this - you should carefully follow the BFG's usage instructions, but the core part is just this:

$ java -jar bfg.jar --strip-blobs-bigger-than 10M my-repo.git 

Any files over 10MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:

$ git gc --prune=now --aggressive 

The BFG is typically at least 10-720x faster than running git-filter-branch, and generally easier to use.

Full disclosure: I'm the author of the BFG Repo-Cleaner.

Sign up to request clarification or add additional context in comments.

3 Comments

I end up waiting... but since this is the only answer, I've checked it as correct. Thanks
This is useless for a large number of very small files. Also, is --aggressive a good idea here? See the woes of “git gc –aggressive” (and how git deltas work).
i didn't have access to bfg and so i ended up using the github method which is almost identical to the original question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.