Every time I see a conflict on something like imports or method signature changes (e.g. renames of variables) in my SCM I wonder if there is something like a language aware diff/merge method that can handle the more annoying small changes that can happen on a shared project. Is there anything out there that handles conflicts more smoothly, working in a Unix environment?
- 3Good idea. Sounds like the concept for your next open source project :)Asaph– Asaph2009-11-29 22:16:31 +00:00Commented Nov 29, 2009 at 22:16
- Well, the "low hanging fruit" cases are so easy that I still believe that somebody must have thought about that bevore I startet this question.Marcus– Marcus2009-11-30 19:44:01 +00:00Commented Nov 30, 2009 at 19:44
- Seems to be a dup for stackoverflow.com/questions/523307/semantic-diff-utilitiesMarcus– Marcus2009-11-30 19:50:14 +00:00Commented Nov 30, 2009 at 19:50
- agreed, this should probably be closed. and agreed, I've always wondered why merges couldn't be made smarter in this way.Kevin Bourrillion– Kevin Bourrillion2009-12-01 21:47:48 +00:00Commented Dec 1, 2009 at 21:47
6 Answers
I agree that it would be awesome if such a tool exists, but there are none that I'm aware of. The reason I believe that there are none is because the merge algorithm for each SCM (whether it is git, hg, bzr, svn, etc) works on the lowest common denominator, which is simply plain text. For these SCM tools to really understand the language syntax and semantics, they would have to include the ability to parse the language. It seems like this is simply too big a task for any SCM to include the ability to parse Java, C#, Python, Ruby, Groovy, C, C++, etc., not to mention that each one of these languages have different syntaxes between version (e.g. Java generics did not exist until 1.5). So the SCM would have to include the ability to detect or be configured to know what language and version of the language the source code is written in.
I think that it would be more likely that any language-dependent merge feature would be found in a 3rd party merge tool (e.g. the merge > tool setting in .gitconfig and the ui > merge setting in .hgrc). This tool could be configured to know that any .java files in your project are written in Java 1.6 and then uses the parsing features in the JDK to generate the AST and perform some "deep" analysis of whether the change was meaningful in the context of that language.
1 Comment
I'm looking for the exact same thing. Those merge tools vendors should probably address this sort semantic, language-aware merge.. if not, I'll have to become one:)
For now, as a poor man's trick, I sometimes preprocess the 3 files (base, ours, theirs) to their 'canonical form' by feeding them through Eclipse's Code Cleanup/Organize Imports/Order Members.
Although limited, this works nicely: last time it reduced the number of conflicts to ~200 into 2. Am planning to wrap this into a script, and plug into git's merge tool.
Have also written script autoresolve java import conflicts, which simply keeps both side of the imports and adds comments to explain what's going on and what todo: 'organise imports'.
2 Comments
Mergiraf offers syntax-aware merging for a variety of languages, and typically handles conflicts of Java imports for instance.
Git can be configured to delegate merging of specific files to it, by registering it as a merge driver. The tool can also be invoked manually after encountering a conflict.
For diffing, there also exist syntax-aware tools that can be configured with Git, such as difftastic.
Comments
To make it easier for anyone landing on this page. This question is a dupe of http://stackoverflow.com/questions/523307/semantic-diff-utilities (its replyed to in the main question, but not obvious)
And the current tool I am aware of (The answer for the quest above) is symantic merge - https://www.semanticmerge.com
There is also https://www.devart.com/codecompare which is close to what you want
Comments
You might want to look into having everyone on your team share the same IDE settings for things like order of imports, formatting, etc., to avoid conflicts like this from occurring in the first place.
2 Comments
doesn't git rebase solve this problem? any variable renames will be accounted for in the associated commits. git rebase lets you stay in sync with upstream commits. as long as you rebase frequently (daily ish?) you shouldn't be getting stupid conflicts like that, and if you are, they are probably real conflicts and not solvable by a java grammar parser.