May 4, 2012

Git :: Removing files from all commits

Alright… this is just a tiny hint on the process I used to nuke some committed files from all commit history

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch my_files" HEAD
rm -rf .git/refs/original/
git reflog expire --all
git gc --aggressive --prune
git push origin +master

And here comes the explanation:

git filter-branch –index-filter “git rm -rf –cached –ignore-unmatch my_files” HEAD

Our action is to rewrite our branch, hence we need to use the top level “filter-branch” command.

–index-filter: there is no need to checkout the current branch, so we can move faster and simply filter in which we issue the “git rm -rf …”

git-rm: quiet obvious

–cached: Only match the paths in the index - leaving modified matching files untouched –ignore-unmatch: result 0 status in any case if no match

HEAD: obviously, we are working on our last commit

rm -rf .git/refs/original/

Even with our branch rewrite from earlier, we still have a backup in refs/original, so we need to delete it

git reflog expire –all

Here is where it gets interesting, you see, each action performed inside git is “backed up” in the reflog. Think of it as a safety net, which is an inventory hash of all the points you been at for each commit. So it is possible to restore the files commit from the reflog, hence “expire –all”.

git gc –aggressive –prune

Oh my… gc what? “garbage collector?”, well actually we have rewritten the branch, purged the reflog and we are left with a lot of unused objects, so time to save some disk space and clean up

git push origin +master

Well… I don’t want to merge and then push, that would be deafeating the whole purpose of my previous actions and since no one has yet pulled from this repo, so we need to force the non-fast-forward since we are pretty much breaking the objects inheritance, hence the “+master”