Hadoop Ecosystem Architecture Overview

Hadoop Technologies Architecture Overview @senthil245 Mail - senthil245@gmail.com

DISTRIBUTED CLUSTER ARCHITECTURE: MASTER/SLAVE

WHEN MAPREDUCE Since the MapReduce is running within a cluster of computing nodes, the architecture is very scalable. • In other words, if the data size is increased by the factor of x, the performance should be still constant if we are adding a predictable/fixed factor of y. The graph on the right is illustrating the relationship between the size of the data (xaxis) and processing time (y-axis). •The blue color curve is the process using traditional programming. On the other hand, the black color curve is the process using Hadoop. When the data size is small, traditional programming is better performance because the bootstrap of Hadoop is expensive (Copy the data within the cluster, inter-nodes communication, etc.). Once the data size is big enough, the penalty of the Hadoop bootstrap becomes invisible. •Hence Hadoop is best suited for Big Data crunching ideally in terms of petaBytes and is not suited for implementing common data integration patterns

APACHE OOZIE – WORKFLOW SCHEDULER (CHECK AZKABAN & LINKEDIN OPENSOURCE)

APACHE S4 (STREAM PROCESSING)(ALSO CHECK KAFKA AND STORM)

APACHE ZOOKEEPER SERVICE (ALSO CHECK APACHE HUE)

APACHE HCATALOG, HIVE AND HBASE

Hadoop Ecosystem Architecture Overview

More Related Content

What's hot

Viewers also liked

Similar to Hadoop Ecosystem Architecture Overview

Recently uploaded

Hadoop Ecosystem Architecture Overview