Overview of no sql

Overview of NoSQL ...motivation, technologies, should you care?

Overview ● Evolution of/motivation for NoSQL databases ● Characterization of NoSQL databases ● Classification of NoSQL databases ● Popularity/usage of NoSQL systems

A brief history of NoSQL ● Originally coined in 1998 by Strozzi for specific non-rel database ○ easy to use, free, text based data storage, easy manipulation of contents of db ● Reintroduced by Evans (Rackspace) in 2009 for conf on open source distributed databases ○ in response to increase in interest in non RDBMS solutions ■ bringing together Cassandra, Mongo, Couch, etc ● Has grown as a movement over last 3 years

Current status ● Significant buzz within community in 2010 ○ initial development of technology ○ pioneer deployments ○ lots of meetups/conferences/birds of feathers ● Many key technologies evolved later 2010, 2011 ○ more large deployments for some technologies ○ small companies with no legacy basing operations on NoSQL

Current Status ● 2012 ○ buzz/hype is fading ○ technology continues to mature ○ increased number of deployments ○ skills sought in job market

NoSQL - a negative definition ● NoSQL simply defined by being non- relational ○ diverse set of technologies fall into NoSQL camp ● Motivations mixed ○ open source ○ scale - TB, PB - particulary for read/write latency ○ increased flexibility over RDBMS systems ○ ability to work with raw data ○ ACID not always most appropriate design choice ■ analytics data is excellent example ● Results in many different NoSQL technologies

Typical characteristics ● Don't use SQL! ● Open Source ● Intended to deliver performance ○ in some dimension ● Typically JOIN not supported ○ performance hit ● Consistency often relaxed ○ eventual consistency ● More flexibility in schema ○ if schema used at all!

Diversity of NoSQL databases ● 122 seperate technologies listed on http: //nosql-database.org/ ○ mix of commercial, open source and some inbetween ● Vary in many dimensions: ○ architecture ○ interfaces ■ api/languages ○ internal data storage ○ distribution mechanisms ■ redundancy, reliability ○ usage - deployments & support community ○ maturity

Classification of NoSQL systems ● Column based solutions ● Document store solutions ● Key/Value solutions ● Graph based solutions ● Less significantly: ○ XML databases ○ Object databases ○ Mulitvalue databases

Column based solutions ● Structured data ○ similar to classical tables ● Generally much more flexible ○ no rigorous schema necessary ○ can typically add columns in ad hoc fashion ■ often without explicitly declaring column ● However, can result in very different usage ○ eg can have millions of columns associated with given row ● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB

Document based solutions ● Less structured data ○ DB composed of 'documents' containing arbitrary data ■ usually containing longer form content eg CMS ● Documents contain some structure to support query/search/filter, etc ● Somewhat less emphasis on a key ○ can be autogenerated ● Quite unlike classical databases ● Examples: MongoDB, CouchDB

Key/value stores ● DBs inspired by memcache ○ simple, fast key/value stores ● Attempt to retain most of DB in memory ○ fast response times ● Different designs for scalability ○ single node/multi node ● Much emphasis on the keys in this type of DB ● Write usually overwrites entire previous entry ● Examples: Redis, Couchbase/Membase, DynamoDB, Riak

Graph based solutions ● Obviously different from previous categories ○ Focus specifically on graphs ● Queries supported are graph-specific ○ eg get nodes related to specified node ● Typically support for solving standard graph problems ○ eg shortest path, general graph traversal ● Can deliver very significant performance over non-graph specific solutions ○ for graph problems! ● Examples: Neo4j

It's a noisy space... ● Very many candidate technologies ● Relatively small amount of real world solutions ● Differences between classifications above is one of emphasis... ○ column based and document based arrive at semi- structured sweet spot from opposite ends of spectrum ● ...although this results in different preferred use cases... ○ document based solution better for document problems, eg CMS

Common techniques used ● Hashing techniques used to map data to nodes in cluster ● Internode communication via Gossip ● Common replication techniques ● Thrift is used in a few cases ● MapReduce often used to search over distributed system

Horses for courses... ● SQL is perfectly good solution for many problems ○ tried and tested ● Some problems require alternative solution ○ typically driven by scale and/or flexibility ● NoSQL offers (many) alternatives ○ although relatively easy to identify realistic options ● Column based approaches good for mostly structured data with enhanced flexibility ● Document based approaches good for document oriented problems

...so let's dive into one NoSQL database... ● Cassandra...

Overview of no sql

More Related Content

What's hot

Similar to Overview of no sql

More from Sean Murphy

Recently uploaded

Overview of no sql