Cassandra Day Atlanta 2015: Python & Cassandra

This should be boring • Talking to a database should not be any of the following: • Exciting • "AH HA!" • Confusing git@github.com:rustyrazorblade/python-presentation.git

Agenda • Go over driver basic concepts • Connecting • Perform queries • Introduce object mapper (cqlengine) • Application integration

DataStax Native Python Driver • Talks to Cassandra • Connection pooling • Aware of cluster topology • Automatic retries / failure management • Load balancing • Will include object mapper (cqlengine) in next release • Fully Open Source (Apache License)

Connect to Cassandra • Import and create a Cluster instance • Cluster takes options such as load balancing policy, reconnect policy, retry policy • On connection, driver discovers entire cluster automatically

Executing queries • CQL: Similar to SQL • session.execute() • Create tables, insert, selects • Can accept simple strings • Not token aware

Prepared Statements • Use for all queries (inserts / updates / deletes) • Decrease server load • Increase security • Allows for token aware queries

Async Queries • Prepared statements required! • Much faster than sync • Utilize the entire cluster • Driver can help us here • We can use futures

1 statement = """INSERT INTO sensor 2 (sensor_id, name, created_at) 3 VALUES (?, ?, ?)""" 4 5 insert_sensor = session.prepare(statement) 6 7 def create_sensor_entries_callback(response, sensor_id): 8 print "CALLBACK" 9 10 for x in range(10): 11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now()) 12 future = session.execute_async(insert_sensor, sensor_data) 13 future.add_callback(create_sensor_entries_callback, sensor_id) 14 Async Queries w/ Callbacks callback function add callback

1 from cassandra.concurrent import execute_concurrent_with_args 2 3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=? 4 ORDER BY created_at DESC LIMIT 1""") 5 6 select_statement = session.prepare(stmt) 7 8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"], 9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"], 10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"], 11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"] 12 13 result = execute_concurrent_with_args(session, select_statement, sensor_ids) Async Queries (managed) prepared statement automatically manages concurrency

Performance Considerations • Like SQL, CQL features IN() but in general, it's terrible for performance • Results in more GC & perf problems • BATCH has the same issue • Failure to get a single result causes entire IN() or batch to retry

Deﬁning Models • Each model maps to a single table • Every model inherits from cassandra.cqlengine.models.Model • Define fields in your table programatically • Collections map to native Python types (lists, sets, dict) • Table management included (no need to write ALTER)

Model with Collections • Sets & Maps are most useful • Use to denormalize • Lists can have performance issues if misused 1 class Message(Model): 2 message_id = TimeUUID(primary_key=True, default=uuid1) 3 subject = Text() 4 body = Text() 5 addressed_to = Set(UUID) 6 7 class Photo(Model): 8 photo_id = UUID(primary_key=True, default=uuid4) 9 title = Text() 10 likes = Map<UUID, Text>

Clustering Keys • Automatically determined by ordering in model • First primary key is partition key • The rest are clustering keys 1 class UsersInGroup(Model): 2 group_id = UUID(primary_key=True) 3 user_id = UUID(primary_key=True) 4 is_admin = Boolean() 5 6 1 class UsersInGroupByState(Model): 2 group_id = UUID(primary_key=True, partition_key=True) 3 state = Text(primary_key=True, partition_key=True 4 user_id = UUID(primary_key=True) 5 is_admin = Boolean(default=False)

Inserting Data • Model.create(**kwargs) • Performs validation • Supports custom validation • Supports TTLs

Lightweight Transactions • Uses paxos for consensus • IF NOT EXISTS for INSERT • IF FIELD=VALUE for UPDATE • Use sparingly - requires several round trips

Batches • Use only to maintain multiple views (for consistency purposes) 1 class User(Model): 2 name = Text(primary_key=True) 3 twitter = Text() 4 email = Text() 5 6 class TwitterToUser(Model): 7 twitter = Text(primary_key=True) 8 name = Text() 9 10 (twitter, name) = ("rustyrazorblade", "jon") 11 12 with BatchQuery() as b: 13 User.batch(b).create(name=name, twitter=twitter) 14 EmailToUser.batch(b).create(twitter=twitter, name=name)

Fetching a Row • Model.get() can be used to fetch a single row • Will throw a DoesNotExist exception if not found

Fetching Many Rows • Model.objects() accepts any filter acceptable to Cassandra

Table Properties • Every table option supported • Compaction • gc_grace_seconds • read repair chance • caching

Table Inheritance • Multiple tables with similar fields • Query Pattern: filtering

Table Polymorphism • Similar to inheritance • Uses a single table • Query pattern: select all types

Virtual Environments • virtualenv is your friend! • mkvirtualenv also your friend! • pip install mkvirtualenv Flask==0.10.1 blist==1.3.6 cassandra-driver==2.1.2 Flask==0.9.0 rednose==0.4.1 ipdb==0.7 ipdbplugin==1.2 ipython==2.3.1 mock==1.0.1 nose==1.3.4 All sandboxed environments

Integrations • Django • django-cassandra-engine • Integrates with manage.py • Flask • use @app.before_first_request • General rule: connect post-fork

Cassandra Day Atlanta 2015: Python & Cassandra

More Related Content

What's hot

Similar to Cassandra Day Atlanta 2015: Python & Cassandra

More from DataStax Academy

Recently uploaded

Cassandra Day Atlanta 2015: Python & Cassandra