The document provides an overview of Folio3, a software development partner specializing in designing software solutions across various domains, including enterprise applications, mobile apps, and social media platforms. It also details NoSQL databases, particularly Apache Cassandra, highlighting its features, data model, consistency levels, and motivations for use in modern applications. Additionally, it discusses the ongoing relevance of RDBMS alongside NoSQL systems, noting their unique strengths and weaknesses.
Who We Are We are a Development Partner for our customers Design software solutions, not just implement them Focus on the solution – Platform and technology agnostic Expertise in building applications that are: Mobile Social Cloud-based Gamified
4.
What We Do Areas of Focus Enterprise Custom enterprise applications Product development targeting the enterprise Mobile Custom mobile apps for iOS, Android, Windows Phone, BB OS Mobile platform (server-to-server) development Social Media CMS based websites for consumers and enterprise (corporate, consumer, community & social networking) Social media platform development (enterprise & consumer)
5.
Folio3 At aGlance Founded in 2005 Over 200 full time employees Offices in the US, Canada, Bulgaria & Pakistan Palo Alto, CA. Sofia, Bulgaria Karachi, Pakistan Toronto, Canada
6.
Areas of Focus:Enterprise Automating workflows Cloud based solutions Application integration Platform development Healthcare Mobile Enterprise Digital Media Supply Chain
Areas of Focus:Mobile Serious enterprise applications for Banks, Businesses Fun consumer apps for app discovery, interaction, exercise gamification and play Educational apps Augmented Reality apps Mobile Platforms
Areas of Focus:Web & Social Media Community Sites based on Content Management Systems Enterprise Social Networking Social Games for Facebook & Mobile Companion Apps for games
Agenda What isNOSQL? Motivations for NOSQL? Brewer’s CAP Theorem Taxonomy of NOSQL databases Apache Cassandra Features Data Model Consistency Operations Cluster Membership What Does NOSQL means for RDBMS?
14.
What is NOSQL? Refers to databases that differs from traditional relational database management system (RDBMS) Distributed, flexible, horizontally scalable data stores Confusion with the term NOSQL NOSQL != No SQL (or Anti-SQL) NOSQL = Not Only SQL NOSQL is an inaccurate term since it is commonly used to refer to "non-relational" databases but the term has stuck
15.
Motivations for NOSQL Classical RDBMS unsuitable for today's web applications because: Performance (Latency): Variable Flexibility: Low Scalability: Variable Functionality
16.
Brewer's CAP Theorm Consistency (C) Availability (A) Partition Tolerance (P) Pick any two Most NOSQL databases sacrifice Consistency in favor of high Availability and Performance
17.
Taxonomy of NOSQL Key/Value Stores - Distributed Hash Tables (DHT) Memcached, Amazon’s Dynamo, Redis, PStore Document Stores Semi structured data (stores entire documents) CouchDB, MongoDB, RDDB, Riak Graph Databases * Based on graph theory ActiveRDF, AllegroGraph, Neo4J Object Database * Versant, Objectivity Column-oriented Stores * these are considered soft NOSQL databases and are usually in NOSQL category because of being "non-relational".
18.
Column-Oriented Data Stores Semi-structured column-based data stores Stores each column separately so that aggregate operations for one column of the entire table are significantly quicker than the traditional row storage model Popular examples Hadoop/HBASE Apache Cassandra Google's BigTable HyperTable Amazon's SimpleDB
19.
Apache Cassandra Fullydistributed column oriented data store Also provides Map Reduce implementation using Hadoop (increased performance) Based on Google's BigTable (Data Model) and Amazon's Dynamo (Consistency & Partition Tolerance) Cassandra values Availability and Partitioning tolerance (AP) while providing tunable consistency levels.
20.
History Developed atFacebook Released as open source project on Google Code in July 2008 Became an Apache Incubator Project in March 2009 Became a top level Apache project in February 2010 Performance Rumors of Facebook having started working on its own separate version of Cassandra
21.
Features Fully Distributed Highly Scalable Fault Tolerant (No single point of failure) Tunable Consistency (Eventually Consistent) Semi-structured key-value store High Availability No Referential Integrity No Joins
22.
Data Model KeySpace(Uppermost namespace) Column Family / Super Column Family (analogous to table) Super Column Column (Name, Value, Timestamp) Rows are referenced through keys Each column is stored in a separate physical file
Apache Cassandra: Consistency Consistency refers to whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write. If W + R > N, you will have strong consistent behavior; that is, readers will always see the most recent write W is the number of nodes to block for on write R is the number to block for on reads N is the replication factor (number of replicas)
34.
Apache Cassandra: Consistency Relational databases provide strong consistency (ACID) Cassandra provide eventual consistency (BASE) meaning the database will eventually reach a consistent state QUORUM reads and writes gives consistency while still allowing availability Q = (N / 2) + 1 (simple majority) If latency is more important than consistency, you can lower values for either or both W and R.
35.
Apache Cassandra: ConsistencyLevels Write ZERO ANY ONE QUORUM ALL Read ZERO ANY ONE QUORUM ALL
36.
Write Operation Clientsends a write request to a random node; the random node forwards the request to the proper node (1st replica responsible for the partition - coordinator) Coordinator sends requests to N replicas If W replicas confirm the write operation then OK Always writable, hinted handoff (If a replica node for the key is down, Cassandra will write a hint to the live replica node indicating that the write needs to be replayed to the unavailable node.)
37.
Read Operation Coordinatorsends requests to N replicas, if R replicas respond then OK If different versions are returned then reconcile and write back the reconciled version (Read Repair)
38.
Cluster Membership GossipProtocol Every T seconds each node increments its heartbeat counter and gossips to another node about the state of the cluster; the receiving node merges the cluster info with its own copy Cluster state (node in/out, failure) propagated quickly: O(LogN) where N is the number of nodes in the cluster
39.
Storage Ring Cassandracluster nodes are organized in a virtual ring. Each node has a single unique token that defines its place in the ring and which keys it is responsible for Key ranges are adjusted when the nodes join or leave
40.
Apache Cassandra: MySQLComparison MySQL (> 50 GB data) Read Average: ~ 350 ms Write Average: ~ 300 ms Cassandra (> 50 GB data) Read Average: 15 ms Write Average: 0.12 ms
Apache Cassandra: Whereto Use? Use Cassandra, if you want/need High write throughput Near-Linear scalability Automated replication/fault tolerance Can tolerate low consistency Can tolerate missing RDBMS features
43.
Apache Cassandra: Users Facebook (of course) To power inbox search (previously) Twitter To handle user relationships, analytics (but not for tweets) Digg & Reddit Both use Cassandra to handle user comments and votes Rackspace IBM To build scalable email system Cisco's WebEx To store user feed and activity in near real time
44.
What does NOSQLmean for the future of RDBMS? No worries! RDBMSs are here to stay for the foreseeable future NOSQL data stores can be used in combination with RDBMS in some situations NOSQL still has a long way to go, in order to reach the widespread (mainstream) use and support of the RDBMS
45.
Weakness of NOSQL No or limited support for complex queries No transactions available (operations are atomic) No standard interface for NOSQL databases (like SQL in relational databases) No or limited administrative features available for NOSQL databases Not suitable (yet) for mainstream use
46.
Why Still UseRDBMS? All the weaknesses of NOSQL Relational databases are widely used and understood RDBMS DBAs and developers are easily available in the market For big business, relational databases are a safe choice because they have heavily invested in relational technology Many database design and development tools available