Skip to content

positiveblue/heron_cpp_prototype

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Heron C++ prototype

Heron is realtime analytics platform developed by Twitter. It is the direct successor of Apache Storm, built to be backwards compatible with Storm's topology API but with a wide array of architectural improvements.

Back in september 2016, I attended to a talk given by Karthik, lead of the real time analytics at Twitter. After the talk I talked with Karthik to see if I could get involved in the project. Heron is mainly written in Java, Python and C++ but the only robust API for the users is in Java. They had just started to writte a lite version for python and given that many companies have their infastructure in C++ having this platform in C++ would be great.

The main concept of Heron is Tuple. This tuples are implemented in Java as a map/array of objects. They are serialized, sent throw the network and deserialized using the features and flexibility that the JVM proporcinate. In C++ the arquitecture has to be a bit more complicated.

In this repository you can find a prototype of how a possible C++ implementation could look like. It is totally deattached of the Heron code but it is a good guide of what is necesary in a real-world implementation.

Architecture

The main features that I implemented are:

  • Element: A pure virtual class with two main methods: save and load. They are going to be used for the serialization/deserealization part. It uses a C++ library called Cereal for it.
  • Basic types: It is an Element wrapper for some basic types (String, Int, Double)
  • Tuple: A tuple is a collection (unordered_map) of Elements
  • Serializer: A pluggable serializer to define how tuples are serialize/deserialize (user defined)

Example

Here is an example of how it looks like. I implemented two auxiliar clases Tweet and User to ilustrate it.

First, we create the elements that we want to work with:

 // Elements std::shared_ptr<Element> eInt(new Int(15)); std::shared_ptr<Element> eDouble(new Double(3.14159)); std::shared_ptr<Element> eString(new String("Jordi"));

We Set them in a tuple:

// Tuple Tuple tuple; tuple.Set("Worker", eString); tuple.Set("Salary", eInt); tuple.Set("Phi", eDouble);

Using dependency injection we serialize the tuple (the user would have to define the order a tuple is serialized/deserialized)

 //////////////////////// //* Sending a tuple *// ////////////////////// // sstream: Will contain the serialization of a tuple std::stringstream os; // Serializer IPluggableSerializer *CSerializer = new CerealSerializer(); tuple.serialize(CSerializer, os); writeToFile("serialize.out", os);

Deserialize the tuple can be done casting the values:

 ////////////////////////// //* Recieving a tuple *// //////////////////////// std::stringstream is; readFromFile("serialize.out", is); Tuple new_tuple; new_tuple.deserialize(CSerializer, is); auto Salary = std::static_pointer_cast<Int>(new_tuple.Get("Salary")); auto Phi = std::static_pointer_cast<Double>(new_tuple.Get("Phi")); auto Worker = std::static_pointer_cast<String>(new_tuple.Get("Worker")); std::cout << "Salary: " << Salary->getValue() << std::endl; std::cout << "Phi: " << Phi->getValue() << std::endl; std::cout << "Worker: " << Worker->getValue() << std::endl;

Well, this is more a proof of concept than a real-world implementation. If someone finds it interesting enough to integrate it in Heron I will be glad to help but I am busy right now with other projects so I could not do it alone.

License

This software is licensed under the same license as Heron. Learn more about it here

About

A C++ prototype of Heron the stream processing engine from Twitter

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages