I am currently trying to figure out how git diff -M<limit> works.
What I found out is, that git diff checks how similar two files (say fileA in revision 1, fileC in revision 2) are by calculating a similarity score. If the similarity score is >= limit, fileA has been renamed to fileC which has possibly been modified (if score is < 100%).
Then I asked myself, what if there are more files with the same sha1-hash within the directory? How does git know which one is the renamed (and changed) version?
To find this out, I tried the following:
First, I created two files with 7 lines ("a", "b", "c", "d", "e", "f", "g")
vi fileA vi fileB Then I added them to the repository and committed:
git add fileA fileB git commit -m "Added fileA and fileB" [master ffc8964] Added fileA and fileB 2 files changed, 6 insertions(+) create mode 100644 tests/fileA create mode 100644 tests/fileB Next, I renamed fileA to fileC using git mv and deleted the first line in fileB and fileC. After that I commited the changes
git mv fileA fileC vi fileB vi fileC git commit -a -m "Renamed and changed files" [master 57ff82a] Renamed and changed filed 2 files changed, 2 deletions(-) rename tests/{fileA => fileC} (85%) fileB and fileC now look like this:
b c d e f g What I expected now is that the checksums of fileB and fileC are equal:
git hash-object fileB fileC 9fbb6235d2d7eb798268d4537acebea297321241 9fbb6235d2d7eb798268d4537acebea297321241 Indeed they are :-)
So how should git diff now know what the renamed file is? Since fileC has been changed, a new blob has been generated by commit and the checksum of fileC and fileA are different as well (obviously).
I tried it:
git diff -M80% HEAD master~1 The output however confused me :-(
diff --git a/tests/fileC b/tests/fileA similarity index 85% rename from tests/fileC rename to tests/fileA index 9fbb623..f9d9a01 100644 --- a/tests/fileC +++ b/tests/fileA @@ -1,3 +1,4 @@ +a b c d diff --git a/tests/fileB b/tests/fileB index 9fbb623..f9d9a01 100644 --- a/tests/fileB +++ b/tests/fileB @@ -1,3 +1,4 @@ +a b c d Apparently git diff DID find out that fileA has been renamed to fileC.
But how? Did git save some kind of connection between fileA and fileC?