Mongo db improve the performance of your application codemotion2016

MADRID · NOV 18-19 · 2016 19th November 2016 18:30-19:15 MongoDB: Improve the performance of your application by @juanroycouto @codemotion_es 1

MADRID · NOV 18-19 · 2016 #whoami ❖ @juanroycouto ❖ www.juanroy.es ❖ www.mongodbspain.com/en/author/juanroy ❖ juanroycouto@gmail.com ❖ MongoDB DBA at Grupo Undanet 2

MADRID · NOV 18-19 · 2016 #Agenda 3 ● Data Model ● Application Patterns ● .explain() ● Indexes ● Storage Engines ● Query Optimizer ● Aggregation Framework ● Profiling ● Tuning ● Sharding ● Storage & Compression

MADRID · NOV 18-19 · 2016 #Data Model 4

MADRID · NOV 18-19 · 2016 #Data Model 5 ● All the information in one document: ○ No joins ○ Single read ○ Higher performance ○ ACID compliance at the document level ● MMAPv1: avoid unbounded document growth ● WiredTiger rewrites the document for each update

MADRID · NOV 18-19 · 2016 #Application Patterns 6 ● Modify only those fields that have changed ● Project only the fields you need ● Avoid negation in queries ● Use Covered Queries when possible ● Avoid scatter-gather queries ● Write Concern: ○ w : 0 → faster than w : all

MADRID · NOV 18-19 · 2016 #.explain() 7 ● Use .explain() to get the time a query needs ● Verbose options: ○ queryPlanner (default) ○ executionStats (stats for the winner plan) ○ allPlansExecution (stats for all candidate plans)

MADRID · NOV 18-19 · 2016 #.explain() 8 ● In-memory sorting data is limited to 32MB (you need an index) ● My index selectivity is the relationship between nReturned and totalKeysExamined

MADRID · NOV 18-19 · 2016 #Indexes 9

MADRID · NOV 18-19 · 2016 #Indexes 10 ● Improve the performance of read-intensive apps ● Overhead: ○ Write operations ○ Disk usage ○ Memory consumption

MADRID · NOV 18-19 · 2016 ● Working Set is made of: ○ Indexes ○ Your most frequently accessed data #Indexes-Working Set 11

MADRID · NOV 18-19 · 2016 ● If your Working Set does not fit into RAM: ○ Increment your RAM ○ Shard the database ○ Avoid scan all documents in the collection ■ Page faults ■ You will fill in your Working Set with infrequent accessed data #Indexes-Working Set 12

MADRID · NOV 18-19 · 2016 #Indexes-Before creating indexes 13 ● Understand your data ● How your queries will be ● Create as few indexes as possible ● Put your highly selective fields first in the index ● Check your current indexes before creating new ones

MADRID · NOV 18-19 · 2016 #Indexes-Advices 14 ● Do not index a little cardinality field ● Check your index performance ● Monitor your system to check if: ○ Your strategy is still up to date ○ You need new indexes ○ You can remove unnecessary indexes

MADRID · NOV 18-19 · 2016 #Indexes-Compound 15 ● A compound index has better performance than: ○ Single indexes ○ Intersection ● Remove indexes that are prefixes of other ● The value of the combined fields must have high cardinality

MADRID · NOV 18-19 · 2016 #Indexes-Compound 16 >db.collection.createIndex( { equality, sort, range } ); >db.collection.find( { book_author : ‘John’, #equality book_id : { $gt : 54 } #range } ) .sort( { book_price : 1 } ); #sort

MADRID · NOV 18-19 · 2016 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 2, "totalKeysExamined" : 100, "totalDocsExamined" : 100, > db.example.dropIndex( { a : 1, b : 1 } ); > db.example.createIndex( { a : 1 } ); > db.example.createIndex( { b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 36, "totalKeysExamined" : 10000, "totalDocsExamined" : 10000 #Indexes-Compound example 17

MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries 18 ● They return results from the indexes ● A covered query does not read from the document level: ○ ndocsExamined : 0 ● In the index must be all the fields: ○ Included in the query ○ Returned by the query

MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries example 19 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 100 > db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 0

MADRID · NOV 18-19 · 2016 #Indexes-TTL 20 ● TTL (TimeToLive) Indexes ○ Allow the user to specify a period of time after which the data will be automatically deleted

MADRID · NOV 18-19 · 2016 #Indexes-TTL example 21 > use test > db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } ); > db.timetolive.insert( { date : new Date() } ); > db.timetolive.find().pretty() { "_id" : ObjectId("581fb43f5626d57bcf66fe73"), "date" : ISODate("2016-11-06T22:52:47.373Z") } > db.timetolive.count() 0 > The background task that removes expired documents runs every 60 seconds.

MADRID · NOV 18-19 · 2016 #Indexes-Partial 22 ● Partial Indexes ○ Allow to index a subset of your data by specifying a filtering expression during the index creation ○ Better query performance while reducing system overhead

MADRID · NOV 18-19 · 2016 #Indexes-Partial example > use test > db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } ); > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "COLLSCAN" > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "IXSCAN" 23

MADRID · NOV 18-19 · 2016 #Indexes-Sparse 24 ● Sparse Indexes ○ Only contain entries for documents that contain a specific field ○ Smaller and more efficient indexes

MADRID · NOV 18-19 · 2016 > use test > db.test.createIndex( { b : 1 } , { sparse : true } ); > db.test.stats() 16384 > > db.test.createIndex( { b : 1 } ); > db.test.stats() 2580480 > #Indexes-Sparse example 25

MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 26 ● Reduced storage space and higher I/O scalability ● It compresses indexes into RAM, freeing up Working Set for documents ● Performance (granular concurrency control) ● Native compression: ○ per collection ○ per index

MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 27 ● Types of compression: ○ Snappy (default): Data and Journal ○ zlib: maximum compression ○ Prefix: for indexes

MADRID · NOV 18-19 · 2016 #Storage Engines: In-Memory 28 ● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications ● Delivers the extreme throughput and predictable latency ● Eliminates the need for separate caching layers

MADRID · NOV 18-19 · 2016 #Query Optimizer 29 ● It selects the best index to use ● It periodically runs alternate query plans and selects the index with the best response time for each query type ● The results of these tests are stored as a cached query plan and are updated periodically

MADRID · NOV 18-19 · 2016 #Aggregation Framework 30

MADRID · NOV 18-19 · 2016 #Aggregation Framework 31 ● Use $indexStats to determine how frequently your index is used ● Each shard has its own statistics ● Stats are reset when you restart your mongod service >db.collection.aggregate([ { $indexStats : {} } ])

MADRID · NOV 18-19 · 2016 #Aggregation Framework example 32 > db.test.aggregate([ { $indexStats : {} } ]).pretty() { "name" : "b_1", "key" : { "b" : 1 }, "host" : "hostname:27017", "accesses" : { "ops" : NumberLong(4), "since" : ISODate("2016-11-07T00:05:39.771Z") } } ...

MADRID · NOV 18-19 · 2016 #Tools-Compass 33

MADRID · NOV 18-19 · 2016 #Tools-Compass 34 ● MongoDB Compass ○ Analyze the size and usage of the indexes ○ It offers visual plan explanations ○ Index creation ○ Real time stats ○ Identify and fix performance and data problems

MADRID · NOV 18-19 · 2016 #Tools-Ops Manager 35 ● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes ● It can be used to visualize output from the profiler when identifying slow queries

MADRID · NOV 18-19 · 2016 #Tools-Cloud Manager & Atlas 36 ● Use the Query Targeting chart

MADRID · NOV 18-19 · 2016 #Tools-Profiling 37 ● Guess the slowest queries for improving their indexes ● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)

MADRID · NOV 18-19 · 2016 #Tools-Profiling example 38 > db.setProfilingLevel(2) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.example.find( { c : 98 } ); > db.system.profile.find().pretty() { "op" : "query", "ns" : "test.example", "query" : { "find" : "example", "filter" : { "c" : 98 } }, … "millis" : 6

MADRID · NOV 18-19 · 2016 #Tuning-SSD 39 ● Most disk access patterns are not sequential ● Data files benefit from SSDs ● MongoDB journal files have high sequential write patterns (good for spinning disks) ● RAM/SSDs better than spinning drives ● Use high performance storage ● Avoid networked storage

MADRID · NOV 18-19 · 2016 #Tuning-CPU 40 ● MongoDB has better performance on faster CPUs ● The MongoDB WiredTiger Storage Engine is better able to saturate multi-core processor resources than the MMAPv1 Storage Engine

MADRID · NOV 18-19 · 2016 #Tuning-Thread 41 ● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores

MADRID · NOV 18-19 · 2016 #Sharding 42 ● The balancer has been moved from the mongos to the primary member of the config server ● Sharding improves the performance of reads ● Benefits (even with a ‘poor’ shard key and scatter-gather): ○ A larger amount of memory to store indexes ○ Parallelize queries across shards reducing latency

MADRID · NOV 18-19 · 2016 #Storage & Compression types 43 ● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices

MADRID · NOV 18-19 · 2016 #Storage & Compression types 44 ● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression ● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit ● As data ages, MongoDB will automatically migrates it between shards

MADRID · NOV 18-19 · 2016 #Questions? 45 Questions?

MongoDB: Improve the performance of your application by @juanroycouto MADRID · NOV 18-19 · 2016 46 Thank you for your attention!

Mongo db improve the performance of your application codemotion2016

More Related Content

What's hot

Similar to Mongo db improve the performance of your application codemotion2016

Recently uploaded

Mongo db improve the performance of your application codemotion2016