Democratizing Fast Analytics with Ampool (Powered by Apache Geode-incubating) Avinash Dongre and Robert Geiger, Ampool Inc.
 " " " Analytics # # # # Apps Multi-Device Testing $ % & |  ) * Analytics	needs	to	work	in	CLOSED	LOOP with	AppsAnalytics	Needs	to	be	Faster!
 " " " Analytics # # # # Apps Multi-Device Testing $ % & |  ) * What	are	the	CHALLENGES? ⚠ Many	data	users/ stakeholders ⚠ Disparate	tools	& processing	needs ⚠ Long	time	to insights
Meet	the	ChallengesDISTRIBUTED	MEMORY	LAYER	… Smart	Distributed	In-Memory	Object	Store
Smart	Distributed	In-Memory	Object	Store CHOICE in	Best	of	Breed	Engines,	…
Smart	Distributed	In-Memory	Object	Store … PLUGGABLE distributed	memory	layer… +  …
AnalyticsIngest App	UseETL … FAST	OBJECT	ACCESS	across	the	pipeline # # # , ,   - ,     … . Data	Architect Data	Developers . . Business	Analysts Data	Scientists .
What	ENABLERS can	help	here? In-Memory	Fabric	Technology • Apache	Geode! • Flexible,	stable,	and	proven	distributed	in-memory	technology New	memory	technologies	and	fast	network	fabrics • Storage	Class	Memory • low	latency,	high	throughput,	persistent • Initially	exposed	via	file	system	interface • Regular	or	memory	mapped
Emerging	Storage	Class	Memory	(SCM)	is	DISRUPTIVE Challenges	the	value	proposition	of	in-memory	solutions Near	DRAM	latency	and	throughput	at	lower	cost Based	on	one	of	several	types	of	memory	technology • MRAM	(magnetic) • ReRAM (resistive) • FRAM	(ferroelectric),	PCM	(phase	change) • 3D-XPointTM (Intel/Micron) Accessible	via	Java	and	C/C++	libraries • Mnemonic	(Java) • Pmem.io (C++)
SCM	is	ATTRACTIVE in	the	Memory/	Storage	Hierarchy
In-Memory	Technology	CHALLENGES Line	between	memory	and	storage	is	blurring File	systems	getting	really	fast,	so	the	speed	gap	is	closing • SCM	File	Systems	will	also	be	low	latency • File	system	overhead	still	limits	latency	improvements • Before:	disk	based	vs.	in-memory • After:	file	system	vs.	byte	addressable	object	store Managing	multiple	layers	and	types	of	memory
Fast	Closed	Loop	Analytics, Powered	by	a Smart,	Distributed	In-Memory	Fabric… High	throughput	and	large	data	handling	matters • Throughput,	latency,	and	capacity: • each	pipeline	stage	values	these	differently Common	interfaces,	multiple	region	types • Meet	the	needs	of	many	types	of	best	of	breed	engines Managing	multiple	layers	of	memory	and	storage • Speed	(latency,	throughput)	differentiator	will	diminish More	classifications	for	data	now • Hot,	cold	=>	hot,	warm,	lukewarm,	cold
…must	handle	MULTIPLE needs	in	one	fabric Need	for	High	Throughput Need	for	Low	latency Early	stages	(ingest,	ETL) Later	stages (data	driven insights	&	actions)
What	Matters	for	App,	DB,	and	Compute? The	flexibility,	suitability,	and	ease	of	use of	the	interfaces Memory	&	storage	are	managed	transparently	to	provide	QoS The	service	guarantee abstractions	are	provided Conflicts	are	managed and	prevented Freeing	developersfrom	re-inventing	the	wheel
A	Distributed,	Memory-Centric,	Object	Store for	Closed	Loop	Analytics Introducing….
Smart	Distributed	In-Memory	Object	Store PLUGGABLE distributed	memory	layer	… +  3D XPointTM ......
Smart	Distributed	In-Memory	Object	Store …	for	MANAGED	FLEXIBILITY... +  3D XPointTM ...... ✅ Flexible	regions and	interfaces	for ‘Best	of	breed’ engines ✅ Extensible	Core ✅ Pluggable	stores
AnalyticsIngest App	UseETL …and	FAST	OBJECT	ACCESS	across	the	pipeline # # # , ,   - ,     … . Data	Architect Data	Developers . . Business	Analysts Data	Scientists .
In-Memory	Distributed	Sys Low-latency	Comms. Key-Value	Store Function	Pushdown + High	Throughput Table	Store Native	InterfacePluggable	Store	Manager Java	API MASH	(CLI	Ext) Java	API Building	on	PROVEN In-memory	Technology Smart	Data	Tiering Mature	Event	Model Tunable	Consistency Metadata/	Catalog Security	AuthZ
ampool + … ORC … First	release	covers	MULTIPLE analytical	needs…
No	change	in	data application	code Config.	changes	only No	change	in	user experience Performance	benefits No	added	hassles Current	mgmt.	tools …and	deliver	VALUE to	all	Analytics	stakeholders . Data	Architect Data	Developers . . Business	Analysts Data	Scientists . . Data	Admins Infra/	Sys	Admins .
Contributing	Back Plan	for	contributions	back	to	Apache	Geode: • Storage	plug-ability	layer • Off-heap	memory	plug-ability • SCM	plugin	(Mnemonic) • Impersonation	support	for	security • Region	type	plug-ability
Thank	You! Avinash Dongre Architect,	Ampool India	Pvt.	Limited avinash@ampool.io Robert	Geiger Chief	Architect	&	VP	Engineering,	Ampool Inc. robert@ampool.io

#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)