Big-Data Processing utilizing Open-Source Technology Stack By Amir Sedighi http://www.linkedin.com/in/amirsedighi @amirsedighi Linux and Ubuntu 14.10 Release Conf 1
References ● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 ● http://www.forbes.com/fdc/welcome_mjx.shtml ● ZYMR Spark Your Real-Time Big Data Analytics Linux and Ubuntu 14.10 Release Conf 2 ● http://dataconomy.com ● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca pe/ ● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 -9122f7210440&v=qf1&b=&from_search=12 ● https://wiki.apache.org/hadoop/PoweredBy
Data Explosion Linux and Ubuntu 14.10 Release Conf 3
Data Explosion Linux and Ubuntu 14.10 Release Conf 4
● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze. – Data Providers ● Business Companies ● People Linux and Ubuntu 14.10 Release Conf 5
Volume, Velocity, Variety ● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt Linux and Ubuntu 14.10 Release Conf 6
Big-Data Processing Linux and Ubuntu 14.10 Release Conf 7
How to provide a Big-Data processing platform using commodity machines? Linux and Ubuntu 14.10 Release Conf 8
Vertical or Horizontal? Linux and Ubuntu 14.10 Release Conf 9
Scale Up vs Scale Out Linux and Ubuntu 14.10 Release Conf 10
Scale Up vs Scale Out Linux and Ubuntu 14.10 Release Conf 11
Big-Data Processing Open-Source Technology Stack Linux and Ubuntu 14.10 Release Conf 12
Map-Reduce Linux and Ubuntu 14.10 Release Conf 13
Hadoop Framework Linux and Ubuntu 14.10 Release Conf 14
Apache Hadoop Main Projects Linux and Ubuntu 14.10 Release Conf 15
Linux and Ubuntu 14.10 Release Conf 16
Data Stores Linux and Ubuntu 14.10 Release Conf 17 ● Data Stores – KeyValue – Graph – Columnar – Document Store – In Memory
Data Transfer Linux and Ubuntu 14.10 Release Conf 18 ● Apache Flume ● Apache Sqoop
Search Linux and Ubuntu 14.10 Release Conf 19 ● Elasticsearch ● Apache SolR
Messaging and Queuing Linux and Ubuntu 14.10 Release Conf 20 ● Apache Kafka ● ZeroMQ
Log Management Linux and Ubuntu 14.10 Release Conf 21 ● ELK ● Logstash ● FluentD
Stream Processing Linux and Ubuntu 14.10 Release Conf 22 ● Apache Storm ● Apache Samza ● Apache Spark
Machine Learning ● Apache Mahout Linux and Ubuntu 14.10 Release Conf 23 ● MLLib ● GraphX
Questions? Linux and Ubuntu 14.10 Release Conf 24

Opensource Frameworks and BigData Processing

  • 1.
    Big-Data Processing utilizing Open-Source Technology Stack By Amir Sedighi http://www.linkedin.com/in/amirsedighi @amirsedighi Linux and Ubuntu 14.10 Release Conf 1
  • 2.
    References ● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 ● http://www.forbes.com/fdc/welcome_mjx.shtml ● ZYMR Spark Your Real-Time Big Data Analytics Linux and Ubuntu 14.10 Release Conf 2 ● http://dataconomy.com ● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca pe/ ● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 -9122f7210440&v=qf1&b=&from_search=12 ● https://wiki.apache.org/hadoop/PoweredBy
  • 3.
    Data Explosion Linuxand Ubuntu 14.10 Release Conf 3
  • 4.
    Data Explosion Linuxand Ubuntu 14.10 Release Conf 4
  • 5.
    ● Big-Data isthat everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze. – Data Providers ● Business Companies ● People Linux and Ubuntu 14.10 Release Conf 5
  • 6.
    Volume, Velocity, Variety ● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt Linux and Ubuntu 14.10 Release Conf 6
  • 7.
    Big-Data Processing Linuxand Ubuntu 14.10 Release Conf 7
  • 8.
    How to providea Big-Data processing platform using commodity machines? Linux and Ubuntu 14.10 Release Conf 8
  • 9.
    Vertical or Horizontal? Linux and Ubuntu 14.10 Release Conf 9
  • 10.
    Scale Up vsScale Out Linux and Ubuntu 14.10 Release Conf 10
  • 11.
    Scale Up vsScale Out Linux and Ubuntu 14.10 Release Conf 11
  • 12.
    Big-Data Processing Open-SourceTechnology Stack Linux and Ubuntu 14.10 Release Conf 12
  • 13.
    Map-Reduce Linux andUbuntu 14.10 Release Conf 13
  • 14.
    Hadoop Framework Linuxand Ubuntu 14.10 Release Conf 14
  • 15.
    Apache Hadoop MainProjects Linux and Ubuntu 14.10 Release Conf 15
  • 16.
    Linux and Ubuntu14.10 Release Conf 16
  • 17.
    Data Stores Linuxand Ubuntu 14.10 Release Conf 17 ● Data Stores – KeyValue – Graph – Columnar – Document Store – In Memory
  • 18.
    Data Transfer Linuxand Ubuntu 14.10 Release Conf 18 ● Apache Flume ● Apache Sqoop
  • 19.
    Search Linux andUbuntu 14.10 Release Conf 19 ● Elasticsearch ● Apache SolR
  • 20.
    Messaging and Queuing Linux and Ubuntu 14.10 Release Conf 20 ● Apache Kafka ● ZeroMQ
  • 21.
    Log Management Linuxand Ubuntu 14.10 Release Conf 21 ● ELK ● Logstash ● FluentD
  • 22.
    Stream Processing Linuxand Ubuntu 14.10 Release Conf 22 ● Apache Storm ● Apache Samza ● Apache Spark
  • 23.
    Machine Learning ●Apache Mahout Linux and Ubuntu 14.10 Release Conf 23 ● MLLib ● GraphX
  • 24.
    Questions? Linux andUbuntu 14.10 Release Conf 24