3

I recently have cloned an SVN repository which used to have a few binaries in it, which are not needed any longer. Unfortunately, I have already pushed it to Github with the binaries inlcuded. I now want to remove these using 'git filter-branch' but I am facing some problems when it comes to tags and branches.

Basically, I have created a simple shell script to remove a list of files which have been determined by the following command:

git rev-list --objects --all | grep .jar > files.txt 

The script for removal looks like the following:

#!/bin/sh while read file_hash file_to_remove do echo "Removing "$file_to_remove; git filter-branch --index-filter "git rm --cached --ignore-unmatch $file_to_remove" rm -rf .git/refs/original/; git reflog expire --all --expire-unreachable=0; git repack -A -d; git prune done < $1 

I have a few tags (all listed in .git/packed-refs), one .git/refs/remotes/origin (pointing to the Github repo). The removal of the files using the above script does not have the wanted effect ('du -cm' remains to output the same size; 'git rev-list' still listing the files) until I manually remove all references from .git/packed-refs and the .git/refs/remotes/origin directory.

Naturally, I am losing all tags as well as the possibility to push my local changes back to Github with this approach. Is there anything I have missed or is there an alternative way for removing files from all branches/tags without destroying my history?

Many thanks in advance, Matthes

1 Answer 1

7

I ended up using the BFG Repo Cleaner on a bare cloned repository (git clone --mirror repo-url). It goes through every branch/tag, leaving each working and it is even much faster than filter-branch. Hope this helps other people having similar issues.

Here is my wrapper script:

#!/bin/bash #usage: ./remove_files.sh file_list.txt bare-repo-dir while read file_hash file_to_remove do echo "Removing "$file_to_remove; lastFile=`echo $file_to_remove | awk -F/ '{print $NF}'`; java -jar bfg.jar --delete-files $lastFile $2; done < $1 cd $2; git gc --prune=now --aggressive; cd ..; 
Sign up to request clarification or add additional context in comments.

3 Comments

Very glad you like the tool @matthes! Out of interest, how many different files did you need to remove? The "--delete-files" switch accepts glob expressions, and in general it's better to do just one big run of The BFG. For instance: '--delete-files *.{xml,exe}'
@Roberto: good hint. indeed, I only removed (a huge list of) .jar files from the repo in the end. So I guess doing via "--delete-files *.jar" would have been even faster (and safer as well?)
Yup, "--delete-files *.jar" would do the trick! (or alternatively something like "--strip-blobs-bigger-than 512K"). The BFG also updates all the commit ids it finds in your commit messages, so it's nice to do that only once. Whichever approach you take, the BFG makes sure it doesn't delete anything in your latest commit, so any jars you're still using won't be removed.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.