git diff - output showing changes "incorrectly"

Question

Let's say I get the diff output of comparing 2 files:

example example example example example example example example

and

example example# example example example example# example# example

So basically, the only difference I made to the original file was adding #-marks to some of the lines. For these 2 files, the diff output would be:

... example +example# example example -example -example -example +example# +example# example ...

So the diff command basically thinks that the first #-mark that I put on the second line is a completely new line in the file. Is there any way to make the diff output the changes like this:

... example -example +example# example example -example -example +example# +example# example ...

This would make my life easier. Thanks!

Those are logically equivalent. There's no way a machine (or even another human) could read your mind to know which of the possible equivalents would "make more sense to you"... and for the next user, the reverse might make more sense. So: No, there's no way to do this. — Jonathan Hall
– Jonathan Hall, Commented Jun 8, 2018 at 13:30
I see what you mean, but in all seriousness, if you would read both the target and source file, would you expect the diff output to be like it is in this case? The output I prefer would make more sense objectively, I don't think you can really argue with that. — HarMala
– HarMala, Commented Jun 11, 2018 at 9:07
No, it doesn't make more sense objectively. It makes more sense if all you did was add # marks to your file. But there's no way to know if that's what you did. Maybe you literally added a line example#, deleted a few lines example, and added others example#. Or maybe you did a combination of adding and subtracting lines, and modifying lines. There is no objective truth here to be found, other than the start and end states. — Jonathan Hall
– Jonathan Hall, Commented Jun 11, 2018 at 9:38
Again, If you look at the two files, would you expect the diff output to be like it is in this case? In my opinion, it's pretty clear that the only changes done are the added #-marks. It's irrelevant how the modified file got to that point: like you said, only the start and end states are relevant. Probably better way of phrasing would be "make more sense objectively for humans, perhaps not for machines". But there isn't really anything I can do about this issue, so arguing about this is useless. — HarMala
– HarMala, Commented Jun 11, 2018 at 10:07
It's clear to you because you have a preconceived idea of what you did to the file. Lets use another example: A list of employees. Version one says "Robert Jones, Bob Smith, Alice A. Johnson" Version two says "Robert B. Jones, Bob Smith, Alice A. Johnson" Did someone just add 'B.'? Or was Robert Jones fired and replaced by a Robert B. Jones? There's no way to tell. — Jonathan Hall
– Jonathan Hall, Commented Jun 11, 2018 at 11:00

Sergio Lema · Accepted Answer · 2018-06-08 13:15:25Z

The case you show up contains all lines with the same content, the git algorithm won't be able to distinguish one line to another. If the lines were different, it will show you up which line changed (with additions or removals). Then, to go further, you can use git diff --word-diff (https://git-scm.com/docs/git-diff#git-diff---word-diffltmodegt) to show you the differences per character, not per line.

Mark Adelsberger · Accepted Answer · 2018-06-08 12:55:32Z

I mean, you can try specifying different algorithms and see if, in a given case, one of them gives you a result you like better. See the git diff docs (https://git-scm.com/docs/git-diff); there's an --algorithm option where you can pick patience, minimal, histogram, or myers.

Is any of them going to do what you want in this case? According to my tests, no; but then, I assume this may be an exaggerated example case, so maybe one of them would help in your real scenario. I'm not aware of a good "practical" explanation of when each is best or how their output differs; but now that you have the names of the algorithms, I suppose you can decide if it's worth trying to research all that.

I'd say there's more to generating a diff than seems obvious. Often there are multiple patches that will get you from point A to point B, and it can be open to interpretation which is "better". In some cases it's possible for special-purpose diff tools to use awareness of language structure to be a little smarter; but what you're showing is a highly repetitive file with little to indicate structure, so I don't even think that kind of thinking would necessarily help here.

Collectives™ on Stack Overflow

git diff - output showing changes "incorrectly"

2 Answers 2

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Related