3

I've committed a bunch of sensitive data to my local repo that has not been published yet.

The sensitive data is scattered across the project in different folders and I want to remove all these completely from git history.

All of the concerning folders have the same name, and are at the same level in the directory in different folders. Following is a sample of my folder structure:

root folder1 ./sensitiveData folder2 ./sensitiveData folder3 ./sensitiveData 

using the following command, I am able to delete the folders containing sensitive data one at a time:

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch javascript/folder1/.sensitiveData' --prune-empty HEAD 

But I want to delete all the folders containing sensitive data in one go, because, they are too many, and I would like to learn how this works.

But using the following command, nothing is rewritten and I am warned that 'refs/heads/master' is unchanged is unchanged:

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData' --prune-empty HEAD 

As I see it, there are two strategies:

  1. Either my pattern is somehow wrong and I need to change it.
  2. Or I should do some looping with bash.

Option one seems more sensible if possible.

4
  • Instead of one-liner you seems need a full script where you do whatever you want to. Commented Aug 22, 2019 at 20:07
  • so the * does not expand in this context? Commented Aug 22, 2019 at 20:08
  • I dunno, but even if it does, with a lot of folders you may have too long command line. This approach is initially wrong. Commented Aug 22, 2019 at 20:09
  • The addresses are relative so it won't go deeper than the three levels that you see in the commands. Commented Aug 22, 2019 at 20:12

2 Answers 2

2

Your command, when you run it, is first evaluated by your shell. So with:

'git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData' 

the single quotes protect the entire thing from the shell, and pass it to git filter-branch as the --index-filter to be used later. The single quotes are gone at this point.

Here's the problem: filters given to git filter-branch get evaluated at filtering-time by another shell (technically, the shell that's running git filter-branch itself). This other shell evals the command:

eval $filter 

So now this second shell re-interprets:

git rm -r --cached --ignore-unmatch javascript/*/.sensitiveData 

It breaks up the arguments at spaces, expands the asterisk based on the current working directory, and invokes git rm -r --cached --ignore-unmatched on the result of the expansion.

If the expansion succeeds, one thing happens; if not, something else happens. Precisely what happens depends on the shell (bash can be configured to behave in several different ways; POSIX sh is more predictable).

The actual current working directory for an --index-filter is generally empty so the expansion will probably fail. This should, in most cases, pass the asterisk on unchanged to Git. Since the argument to git rm is (mostly / essentially) a pathspec, Git will now do its own expansion. This should have worked, so either the path itself is wrong, or the directory is not empty, or there's something odd about your shell so that the failed expansion didn't pass the literal text javascript/*/.sensitiveData to git rm.

You can take some variables out of this equation by using:

'git rm -r --cached --ignore-unmatch javascript/\*/.sensitiveData' 

so that the second shell sees:

git rm -r --cached --ignore-unmatch javascript/\*/.sensitiveData 

which will force the second shell to pass:

javascript/*/.sensitiveData 

directly to git rm. Given that this probably should have worked anyway, though, it's of interest to check whether javascript/*/.sensitiveData would match the right files in the specific commit(s), which you can do kind of clumsily / manually using git ls-tree -r on those commits.

Sign up to request clarification or add additional context in comments.

3 Comments

thanks @torek for your thorough answer. I escaped the asterisk \* and it didn't help. I also ran the command git ls-tree -r javascript/*/.sensitiveData as you suggested and got the following error fatal: Not a valid object name javascript/folder1/.sensitiveData which seems to indicate that the pattern here is at least working. Any thoughts?
git ls-tree -r needs a commit hash ID argument. the idea here was to look at one of the commits from which you want these files removed, to make sure that javascript/*/.sensitiveData should match all the files. So let's say that one of the commits with the files you want gone is a123456; use git ls-tree -r a123456 and eyeball the output.
thanks @torek I just went with another solution using bash (and learned some bash along the way). It's a simple for in but it worked perfectly. Already learned a lot from your answer. Will check git ls-tree. Thanks again :D
-1

At the end, what solved my problem was a small bash script using the for in construct.

for name in javascript/*/.sensitiveData do git filter-branch -f --index-filter "git rm -r --cached --ignore-unmatch $name" --prune-empty HEAD done 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.