1

Hadoop currently has three branches: 0.2x, 1.x, and 2.x. What are the arguments in favor of using one over another?

2 Answers 2

2

Hadoop recently changed its Map/Reduce implementation (now called Yarn). That may be one reason to go for a relatively new Version.

If you want to use Hadoop in conjunction with other, related projects like HBase the version vector is not quite trivial.

You may want to look at Cloudera's offering (I am not affiliated with Cloudera). They offer distributions from which you can pick your subset of tools that fit to each other. And of course they also offer professional services.

Sign up to request clarification or add additional context in comments.

2 Comments

They all look new. The latest one is currently the 0.2x branch! Or maybe the 2.x snapshots?
cdh4 contains 2.0.0 (at least it says so in the package)
1

One way to deal with the way too many versions of hadoop that are available out there is to go with the Cloudera offerings. Products like these make it easier on you and you don't have to worry too much about configurations.

2 Comments

Some packages included with the Cloudera distribution seem pretty dated; e.g. Mahout is a version or two behind. Is it possible to update selected components?
I lost some significant time when starting in the Hadoop space because of versions that did not work together and because the information what versions work together is not always trivial to find. If you know what a component depends on and if versions are compatible, then that may well work. Be careful though.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.