5

I have a GitHub repository with commits appearing in the log and visible in the GitHub Desktop app but the changes in the commits are not applied to the code (or visible in Visual Studio with blame).

The commits in question are part of a branch merge that happened a month ago. We merged master into a branch then merged the branch back into master. The "merge 'master' into 'branch'" commit shows all changes from the merge. Some of these changes are not being applied to the code and visual studio says the affected pages haven't been updated since early last year. Could this have something to do with deleting the branch after merging back into master?

We reviewed all commits after the merge and none of them changed any code on the pages affected by this issue. The pages were not renamed, deleted, or moved.

We tried to revert the merge to see what would happen and most of the commits disappeared in GitHub Desktop (from 58 changed files to 8 files changed), only applying a few changes from the merge. We also tried cherry-picking the commits from this merge and they are applied correctly. Since we can do this to fix the issue, we are more interested in why the commits are not applied to the related code files.

Update: We tried doing a git checkout <sha1> on the commits after the merge commit and discovered the code is different (3 commits later - also a merge) but there are no references to the code files in these commits.

6
  • Did you fetch/checkout-newer-version on your local? Commented Mar 27, 2019 at 19:21
  • Yes. Even looking at the files on the master branch on GitHub.com shows the same issue. Commented Mar 27, 2019 at 20:02
  • But if you checkout those revisions from a month ago, the changes are there? Commented Mar 27, 2019 at 20:04
  • Yep. If I checkout the commit where we merged master into the feature branch, the changes are there. None of the commits after this commit contain changes to the problematic files. Commented Mar 27, 2019 at 20:11
  • 1
    Well... if you go back in history and you can see some changes... and then you go back to the future and the changes are not there, then they have been reverted somehow in the process... in a revision between those two revisions we are talking about. If checking the code is easy to do, you might consider using bisect to know where the change was removed. Commented Mar 27, 2019 at 20:23

1 Answer 1

6

It's worth noting that commits are not, and do not contain, changes. Commits contain complete snapshots. Commit viewers deliberately lie to you (and you usually want them to do this) so as to show the commit as a set of changes.

The mechanism behind this is important. A commit is a snapshot—like a weather report that tells you that it's currently 20˚C, for instance—and you want to know how different it is now. You have to pick another commit—"yesterday", for instance—for when you want the comparison. Then if it was 19˚C yesterday, and it's 20˚C today, the difference is that it's 1˚C warmer. But to get that you had to pick a previous day to compare to.

Each commit, in Git, is uniquely identified by its hash ID. The hash ID is how Git can get all the files that are in the snapshot and the metadata about the commit, such as who made it, when, and why (the log message). One of the items saved within each commit is the hash ID of the previous commit. This is what Git commit viewers use to construct the difference.

A commit viewer won't show you every line of every file in the commit with hash H. Instead, it will find the commit's predecessor—its parent, in Git's terms. That parent has some different hash G. The viewer extracts both commits, compares them, and tells you: Between G and H, these files are different, with these changes to these lines. That's usually much shorter—and much more useful—than here's the full snapshot in H.

But this breaks down at merges. If we draw a nice linear set of commits:

... <-F <-G <-H <-- you-are-here 

(the arrows point backwards because each commit records its parent; parents don't remember their children) it's easy to compare G vs H. But eventually you combine two lines of development:

 o--...--o--K / \ ...--* M <-- mainline \ / o--o--...--L <-- branch 

The main line split apart at some point, with two different people or groups developing. Then we—or someone, anyway—used git checkout mainline; git merge branch and went through the whole scary1 and magical2 process of a merge operation, which resulted in this merge commit M.

Commit M is just like any other commit in that it has a snapshot and some metadata. The snapshot is just like any other snapshot. The only thing that's special about M is that, in its metadata, it doesn't just list commit K as its parent. Instead, it lists both commits—K and L—as its two parents.


1It's not actually scary.

2It's not magical either; see below.


How Git's automatic merge works

Let's take a quick look at git merge and merge conflicts. If there are no conflicts, Git does the whole merge by itself. Usually those cases do not result in this sort of puzzlement, so let's see what happens when there is a conflict.

To start the merge, Git simply compares * vs K—the same way that Git always compares any simple pair of commits—to find out what's different. Then, Git compares *-vs-L, to find out what's different. Then Git combines the two sets of changes. This is the to merge, or what I like to call merge as a verb, part of the process of merging. The merged changes are to be applied to the snapshot in commit *.

Remember that each commit holds a snapshot. Commit * has all the files in the state they had at the time someone made *. Commit K has all the files in some other state, and commit L has all the files in some third state. There might even be files in K that aren't in *, and/or in L that aren't in *, and so on, but usually most files are mostly in all three inputs.

Suppose "we" means the people who worked on the K line, and "they" means the people who worked on the L line. We changed files A, B and C. They changed files B, C, and D. Then Git just takes all of our changes to A, and all of their changes to D. That part is easy because we didn't touch D and they didn't touch A. That part of the merge is done.

Now Git figures out which lines we changed within file B, and which lines they changed in the same file. If our lines do not overlap their lines at all—note that Git considers "just touching" as overlapping sometimes—then Git can just apply both changes to file B from commit *. That part of the merge is done too now.

Git figures out which lines we changed in C, and which lines they changed. Uh oh, this time we both changed the same lines. Git writes, to the work-tree, the combination of changes, with conflict markers, and declares the merge to be conflicted.

Since the merge is conflicted, Git stops and gets help from the person who is doing the merge. It's their job to fix this up. There are a lot of ways to fix it up but they all end the same way: whoever is doing the fixing writes the correct version of file C into the work-tree and runs git add C to tell Git: this is the correct result.

Git doesn't check what they wrote, it just takes whatever they put into the final file. If they have completely mucked everything up, for instance by throwing your code away entirely, Git is OK with that! Git assumes they know what they are doing.

They now run git commit or git merge --continue, and Git uses the completed merge snapshot to make merge commit M, which looks like we've drawn it.

Back to your problem at hand

So let's go back to our commit viewer. You ask it to view commit M. It shows you the metadata as usual—the name of whoever made the commit, and so on. It may or may not show you both parent hash IDs, depending on the viewer. It probably shows you the log message that the person who ran git merge used to record why they did the merge, and save any important notes. If this person was super-diligent, the log message might even be useful ... but alas, most people use the automatically generated, mostly-worthless log message: "merge branch ...".

Now your viewer should go on to show you what changed in this commit. But now there's a problem. To show what changed, the viewer has to look at the parent commit and compare. There isn't one parent. There are two parents. Which one will the viewer use?

The actual answer here depends on the viewer. Some viewers just give up completely and show you nothing. For instance, git log -p does exactly this. It sounds like you may be using this kind of viewer. Another viewer, the one that git show runs, tries to be useful: it actually compares the merge M against both parents, K and L. But alas, this viewer tries to be too helpful. It's concerned with places where the merge might have had merge conflicts, so it does not display any files where the file in M exactly matches either the one in K, or the one in L.

If the person who made the merge did it incorrectly by throwing away some of file changes that should have been in M, this kind of viewer will likewise throw away those changes from the display. In this case, file C exactly matches their copy from commit L. So git show, as a merge viewer, will not show you file C.

(Using git log as a commit viewer is even worse, of course: it doesn't show you any of A, B, C, or D, even though those are the four files that had some changes.)

You can instruct git log (and git show) to break up a merge commit into two virtual commits. That is, given:

...--K \ M / ...--L 

you can get them to pretend that they have:

...--K--M1 ...--L--M2 

and show you first K-vs-M1, then L-vs-M2. That's often somewhat useful for these cases. To do this, add -m to git log or git show. (Note that M1 and M2 never go into the repository, they're just pretend-commits for the duration of the "show me the difference" part of viewing a merge commit.)

The bottom line, as it were

If someone makes a bad merge snapshot, many viewers just won't show you that. The way to find it is to look at the commits before and after the merge. If someone keeps doing this, you'll need to teach them how to merge correctly. It's rare that throw their changes away and use mine instead is correct. Git offers this as an option, but they should use that option with care, not just because it solves their conflicts.

Sign up to request clarification or add additional context in comments.

2 Comments

Git Show shows the files in question. Thank you for the thorough explanation!
@torek This is a woefully underrated answer that clarified several things in git. Thank you!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.