I have a commit with a large number (hundreds) of similar hunks, and I'd like to list each unique hunk in the commit in order to compare them.
I wrote the following GNU awk script, which writes each hunk to a unique file (hunk-[md5-of-hunk].txt):
BEGIN { hunk = "" buildhunk = 0 } function writeHunk() { if (length(hunk) > 0) { print hunk > "hunk.tmp" close("hunk.tmp") cmd = "cat hunk.tmp | md5" cmd | getline md5 close(cmd) if (!(md5 in hunkfiles)) { hunkfilename = "hunk-" md5 ".txt" print hunk > hunkfilename hunkfiles[md5] = hunkfilename } } } /^@@|^diff/ { writeHunk() hunk = "" buildhunk = ($1 == "@@") ? 1 : 0 } /^[ +-]/ { if (buildhunk) { hunk = hunk $0 "\n" } } END { writeHunk() system("rm hunk.tmp") for (md5 in hunkfiles) { print hunkfiles[md5] } } I then run this with git show [commit-SHA] | awk -f my_script.awk, which creates & lists the resulting files. It works for my purposes, but is there a way to do this more efficiently using git's plumbing commands.
Example
Suppose the commit's patch looks like this (reduced to 1 line of context below for clarity's sake):
diff --git a/file1.txt b/file1.txt index a3fb2ed..4d6f587 100644 --- a/file1.txt +++ b/file1.txt @@ -3,2 +3,3 @@ context context +added line context @@ -7,2 +8,3 @@ context context +added line context @@ -11,2 +13,3 @@ context context +added line context @@ -15,2 +18,3 @@ context context +different added line context @@ -19,2 +23,3 @@ context context +different added line context @@ -23,2 +28,3 @@ context context +different added line context @@ -27,2 +33,3 @@ context context +even more different added line context @@ -31,2 +38,3 @@ context context +even more different added line context I want to be able to identity that there are only 3 unique hunks, and see what they are. Namely:
Unique hunk 1:
context +added line context Unique hunk 2:
context +different added line context Unique hunk 3:
context +even more different added line context