MADRID · NOV 18-19 · 2016 19th November 2016 18:30-19:15 MongoDB: Improve the performance of your application by @juanroycouto @codemotion_es 1
MADRID · NOV 18-19 · 2016 #whoami ❖ @juanroycouto ❖ www.juanroy.es ❖ www.mongodbspain.com/en/author/juanroy ❖ juanroycouto@gmail.com ❖ MongoDB DBA at Grupo Undanet 2
MADRID · NOV 18-19 · 2016 #Agenda 3 ● Data Model ● Application Patterns ● .explain() ● Indexes ● Storage Engines ● Query Optimizer ● Aggregation Framework ● Profiling ● Tuning ● Sharding ● Storage & Compression
MADRID · NOV 18-19 · 2016 #Data Model 4
MADRID · NOV 18-19 · 2016 #Data Model 5 ● All the information in one document: ○ No joins ○ Single read ○ Higher performance ○ ACID compliance at the document level ● MMAPv1: avoid unbounded document growth ● WiredTiger rewrites the document for each update
MADRID · NOV 18-19 · 2016 #Application Patterns 6 ● Modify only those fields that have changed ● Project only the fields you need ● Avoid negation in queries ● Use Covered Queries when possible ● Avoid scatter-gather queries ● Write Concern: ○ w : 0 → faster than w : all
MADRID · NOV 18-19 · 2016 #.explain() 7 ● Use .explain() to get the time a query needs ● Verbose options: ○ queryPlanner (default) ○ executionStats (stats for the winner plan) ○ allPlansExecution (stats for all candidate plans)
MADRID · NOV 18-19 · 2016 #.explain() 8 ● In-memory sorting data is limited to 32MB (you need an index) ● My index selectivity is the relationship between nReturned and totalKeysExamined
MADRID · NOV 18-19 · 2016 #Indexes 9
MADRID · NOV 18-19 · 2016 #Indexes 10 ● Improve the performance of read-intensive apps ● Overhead: ○ Write operations ○ Disk usage ○ Memory consumption
MADRID · NOV 18-19 · 2016 ● Working Set is made of: ○ Indexes ○ Your most frequently accessed data #Indexes-Working Set 11
MADRID · NOV 18-19 · 2016 ● If your Working Set does not fit into RAM: ○ Increment your RAM ○ Shard the database ○ Avoid scan all documents in the collection ■ Page faults ■ You will fill in your Working Set with infrequent accessed data #Indexes-Working Set 12
MADRID · NOV 18-19 · 2016 #Indexes-Before creating indexes 13 ● Understand your data ● How your queries will be ● Create as few indexes as possible ● Put your highly selective fields first in the index ● Check your current indexes before creating new ones
MADRID · NOV 18-19 · 2016 #Indexes-Advices 14 ● Do not index a little cardinality field ● Check your index performance ● Monitor your system to check if: ○ Your strategy is still up to date ○ You need new indexes ○ You can remove unnecessary indexes
MADRID · NOV 18-19 · 2016 #Indexes-Compound 15 ● A compound index has better performance than: ○ Single indexes ○ Intersection ● Remove indexes that are prefixes of other ● The value of the combined fields must have high cardinality
MADRID · NOV 18-19 · 2016 #Indexes-Compound 16 >db.collection.createIndex( { equality, sort, range } ); >db.collection.find( { book_author : ‘John’, #equality book_id : { $gt : 54 } #range } ) .sort( { book_price : 1 } ); #sort
MADRID · NOV 18-19 · 2016 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 2, "totalKeysExamined" : 100, "totalDocsExamined" : 100, > db.example.dropIndex( { a : 1, b : 1 } ); > db.example.createIndex( { a : 1 } ); > db.example.createIndex( { b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 36, "totalKeysExamined" : 10000, "totalDocsExamined" : 10000 #Indexes-Compound example 17
MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries 18 ● They return results from the indexes ● A covered query does not read from the document level: ○ ndocsExamined : 0 ● In the index must be all the fields: ○ Included in the query ○ Returned by the query
MADRID · NOV 18-19 · 2016 #Indexes-Covered Queries example 19 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 100 > db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 0
MADRID · NOV 18-19 · 2016 #Indexes-TTL 20 ● TTL (TimeToLive) Indexes ○ Allow the user to specify a period of time after which the data will be automatically deleted
MADRID · NOV 18-19 · 2016 #Indexes-TTL example 21 > use test > db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } ); > db.timetolive.insert( { date : new Date() } ); > db.timetolive.find().pretty() { "_id" : ObjectId("581fb43f5626d57bcf66fe73"), "date" : ISODate("2016-11-06T22:52:47.373Z") } > db.timetolive.count() 0 > The background task that removes expired documents runs every 60 seconds.
MADRID · NOV 18-19 · 2016 #Indexes-Partial 22 ● Partial Indexes ○ Allow to index a subset of your data by specifying a filtering expression during the index creation ○ Better query performance while reducing system overhead
MADRID · NOV 18-19 · 2016 #Indexes-Partial example > use test > db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } ); > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "COLLSCAN" > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "IXSCAN" 23
MADRID · NOV 18-19 · 2016 #Indexes-Sparse 24 ● Sparse Indexes ○ Only contain entries for documents that contain a specific field ○ Smaller and more efficient indexes
MADRID · NOV 18-19 · 2016 > use test > db.test.createIndex( { b : 1 } , { sparse : true } ); > db.test.stats() 16384 > > db.test.createIndex( { b : 1 } ); > db.test.stats() 2580480 > #Indexes-Sparse example 25
MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 26 ● Reduced storage space and higher I/O scalability ● It compresses indexes into RAM, freeing up Working Set for documents ● Performance (granular concurrency control) ● Native compression: ○ per collection ○ per index
MADRID · NOV 18-19 · 2016 #Storage Engines-WiredTiger 27 ● Types of compression: ○ Snappy (default): Data and Journal ○ zlib: maximum compression ○ Prefix: for indexes
MADRID · NOV 18-19 · 2016 #Storage Engines: In-Memory 28 ● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications ● Delivers the extreme throughput and predictable latency ● Eliminates the need for separate caching layers
MADRID · NOV 18-19 · 2016 #Query Optimizer 29 ● It selects the best index to use ● It periodically runs alternate query plans and selects the index with the best response time for each query type ● The results of these tests are stored as a cached query plan and are updated periodically
MADRID · NOV 18-19 · 2016 #Aggregation Framework 30
MADRID · NOV 18-19 · 2016 #Aggregation Framework 31 ● Use $indexStats to determine how frequently your index is used ● Each shard has its own statistics ● Stats are reset when you restart your mongod service >db.collection.aggregate([ { $indexStats : {} } ])
MADRID · NOV 18-19 · 2016 #Aggregation Framework example 32 > db.test.aggregate([ { $indexStats : {} } ]).pretty() { "name" : "b_1", "key" : { "b" : 1 }, "host" : "hostname:27017", "accesses" : { "ops" : NumberLong(4), "since" : ISODate("2016-11-07T00:05:39.771Z") } } ...
MADRID · NOV 18-19 · 2016 #Tools-Compass 33
MADRID · NOV 18-19 · 2016 #Tools-Compass 34 ● MongoDB Compass ○ Analyze the size and usage of the indexes ○ It offers visual plan explanations ○ Index creation ○ Real time stats ○ Identify and fix performance and data problems
MADRID · NOV 18-19 · 2016 #Tools-Ops Manager 35 ● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes ● It can be used to visualize output from the profiler when identifying slow queries
MADRID · NOV 18-19 · 2016 #Tools-Cloud Manager & Atlas 36 ● Use the Query Targeting chart
MADRID · NOV 18-19 · 2016 #Tools-Profiling 37 ● Guess the slowest queries for improving their indexes ● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)
MADRID · NOV 18-19 · 2016 #Tools-Profiling example 38 > db.setProfilingLevel(2) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.example.find( { c : 98 } ); > db.system.profile.find().pretty() { "op" : "query", "ns" : "test.example", "query" : { "find" : "example", "filter" : { "c" : 98 } }, … "millis" : 6
MADRID · NOV 18-19 · 2016 #Tuning-SSD 39 ● Most disk access patterns are not sequential ● Data files benefit from SSDs ● MongoDB journal files have high sequential write patterns (good for spinning disks) ● RAM/SSDs better than spinning drives ● Use high performance storage ● Avoid networked storage
MADRID · NOV 18-19 · 2016 #Tuning-CPU 40 ● MongoDB has better performance on faster CPUs ● The MongoDB WiredTiger Storage Engine is better able to saturate multi-core processor resources than the MMAPv1 Storage Engine
MADRID · NOV 18-19 · 2016 #Tuning-Thread 41 ● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores
MADRID · NOV 18-19 · 2016 #Sharding 42 ● The balancer has been moved from the mongos to the primary member of the config server ● Sharding improves the performance of reads ● Benefits (even with a ‘poor’ shard key and scatter-gather): ○ A larger amount of memory to store indexes ○ Parallelize queries across shards reducing latency
MADRID · NOV 18-19 · 2016 #Storage & Compression types 43 ● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices
MADRID · NOV 18-19 · 2016 #Storage & Compression types 44 ● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression ● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit ● As data ages, MongoDB will automatically migrates it between shards
MADRID · NOV 18-19 · 2016 #Questions? 45 Questions?
MongoDB: Improve the performance of your application by @juanroycouto MADRID · NOV 18-19 · 2016 46 Thank you for your attention!

Mongo db improve the performance of your application codemotion2016

  • 1.
    MADRID · NOV18-19 · 2016 19th November 2016 18:30-19:15 MongoDB: Improve the performance of your application by @juanroycouto @codemotion_es 1
  • 2.
    MADRID · NOV18-19 · 2016 #whoami ❖ @juanroycouto ❖ www.juanroy.es ❖ www.mongodbspain.com/en/author/juanroy ❖ juanroycouto@gmail.com ❖ MongoDB DBA at Grupo Undanet 2
  • 3.
    MADRID · NOV18-19 · 2016 #Agenda 3 ● Data Model ● Application Patterns ● .explain() ● Indexes ● Storage Engines ● Query Optimizer ● Aggregation Framework ● Profiling ● Tuning ● Sharding ● Storage & Compression
  • 4.
    MADRID · NOV18-19 · 2016 #Data Model 4
  • 5.
    MADRID · NOV18-19 · 2016 #Data Model 5 ● All the information in one document: ○ No joins ○ Single read ○ Higher performance ○ ACID compliance at the document level ● MMAPv1: avoid unbounded document growth ● WiredTiger rewrites the document for each update
  • 6.
    MADRID · NOV18-19 · 2016 #Application Patterns 6 ● Modify only those fields that have changed ● Project only the fields you need ● Avoid negation in queries ● Use Covered Queries when possible ● Avoid scatter-gather queries ● Write Concern: ○ w : 0 → faster than w : all
  • 7.
    MADRID · NOV18-19 · 2016 #.explain() 7 ● Use .explain() to get the time a query needs ● Verbose options: ○ queryPlanner (default) ○ executionStats (stats for the winner plan) ○ allPlansExecution (stats for all candidate plans)
  • 8.
    MADRID · NOV18-19 · 2016 #.explain() 8 ● In-memory sorting data is limited to 32MB (you need an index) ● My index selectivity is the relationship between nReturned and totalKeysExamined
  • 9.
    MADRID · NOV18-19 · 2016 #Indexes 9
  • 10.
    MADRID · NOV18-19 · 2016 #Indexes 10 ● Improve the performance of read-intensive apps ● Overhead: ○ Write operations ○ Disk usage ○ Memory consumption
  • 11.
    MADRID · NOV18-19 · 2016 ● Working Set is made of: ○ Indexes ○ Your most frequently accessed data #Indexes-Working Set 11
  • 12.
    MADRID · NOV18-19 · 2016 ● If your Working Set does not fit into RAM: ○ Increment your RAM ○ Shard the database ○ Avoid scan all documents in the collection ■ Page faults ■ You will fill in your Working Set with infrequent accessed data #Indexes-Working Set 12
  • 13.
    MADRID · NOV18-19 · 2016 #Indexes-Before creating indexes 13 ● Understand your data ● How your queries will be ● Create as few indexes as possible ● Put your highly selective fields first in the index ● Check your current indexes before creating new ones
  • 14.
    MADRID · NOV18-19 · 2016 #Indexes-Advices 14 ● Do not index a little cardinality field ● Check your index performance ● Monitor your system to check if: ○ Your strategy is still up to date ○ You need new indexes ○ You can remove unnecessary indexes
  • 15.
    MADRID · NOV18-19 · 2016 #Indexes-Compound 15 ● A compound index has better performance than: ○ Single indexes ○ Intersection ● Remove indexes that are prefixes of other ● The value of the combined fields must have high cardinality
  • 16.
    MADRID · NOV18-19 · 2016 #Indexes-Compound 16 >db.collection.createIndex( { equality, sort, range } ); >db.collection.find( { book_author : ‘John’, #equality book_id : { $gt : 54 } #range } ) .sort( { book_price : 1 } ); #sort
  • 17.
    MADRID · NOV18-19 · 2016 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 2, "totalKeysExamined" : 100, "totalDocsExamined" : 100, > db.example.dropIndex( { a : 1, b : 1 } ); > db.example.createIndex( { a : 1 } ); > db.example.createIndex( { b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 36, "totalKeysExamined" : 10000, "totalDocsExamined" : 10000 #Indexes-Compound example 17
  • 18.
    MADRID · NOV18-19 · 2016 #Indexes-Covered Queries 18 ● They return results from the indexes ● A covered query does not read from the document level: ○ ndocsExamined : 0 ● In the index must be all the fields: ○ Included in the query ○ Returned by the query
  • 19.
    MADRID · NOV18-19 · 2016 #Indexes-Covered Queries example 19 > use test > db.example.createIndex( { a : 1, b : 1 } ); > db.example.explain("executionStats").find( { a : 56, b : 87 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 100 > db.example.explain("executionStats").find( { a : 56, b : 87 }, { _id : 0, a : 1, b : 1 } ); "nReturned" : 100, "executionTimeMillis" : 3, "totalKeysExamined" : 100, "totalDocsExamined" : 0
  • 20.
    MADRID · NOV18-19 · 2016 #Indexes-TTL 20 ● TTL (TimeToLive) Indexes ○ Allow the user to specify a period of time after which the data will be automatically deleted
  • 21.
    MADRID · NOV18-19 · 2016 #Indexes-TTL example 21 > use test > db.timetolive.createIndex( { date : 1 } , { expireAfterSeconds: 20 } ); > db.timetolive.insert( { date : new Date() } ); > db.timetolive.find().pretty() { "_id" : ObjectId("581fb43f5626d57bcf66fe73"), "date" : ISODate("2016-11-06T22:52:47.373Z") } > db.timetolive.count() 0 > The background task that removes expired documents runs every 60 seconds.
  • 22.
    MADRID · NOV18-19 · 2016 #Indexes-Partial 22 ● Partial Indexes ○ Allow to index a subset of your data by specifying a filtering expression during the index creation ○ Better query performance while reducing system overhead
  • 23.
    MADRID · NOV18-19 · 2016 #Indexes-Partial example > use test > db.sysprofile.createIndex( { ns : 1, op : 1 }, { partialFilterExpression : { millis : { $gt : 10000 } } } ); > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query' }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "COLLSCAN" > db.sysprofile.explain().find( { ns : 'school2.students', op : 'query', millis : { $gte : 20000 } }, { millis : 1 } ); "winningPlan" : { "inputStage" : { "stage" : "IXSCAN" 23
  • 24.
    MADRID · NOV18-19 · 2016 #Indexes-Sparse 24 ● Sparse Indexes ○ Only contain entries for documents that contain a specific field ○ Smaller and more efficient indexes
  • 25.
    MADRID · NOV18-19 · 2016 > use test > db.test.createIndex( { b : 1 } , { sparse : true } ); > db.test.stats() 16384 > > db.test.createIndex( { b : 1 } ); > db.test.stats() 2580480 > #Indexes-Sparse example 25
  • 26.
    MADRID · NOV18-19 · 2016 #Storage Engines-WiredTiger 26 ● Reduced storage space and higher I/O scalability ● It compresses indexes into RAM, freeing up Working Set for documents ● Performance (granular concurrency control) ● Native compression: ○ per collection ○ per index
  • 27.
    MADRID · NOV18-19 · 2016 #Storage Engines-WiredTiger 27 ● Types of compression: ○ Snappy (default): Data and Journal ○ zlib: maximum compression ○ Prefix: for indexes
  • 28.
    MADRID · NOV18-19 · 2016 #Storage Engines: In-Memory 28 ● Extreme performance coupled with real time analytics for the most demanding, latency-sensitive applications ● Delivers the extreme throughput and predictable latency ● Eliminates the need for separate caching layers
  • 29.
    MADRID · NOV18-19 · 2016 #Query Optimizer 29 ● It selects the best index to use ● It periodically runs alternate query plans and selects the index with the best response time for each query type ● The results of these tests are stored as a cached query plan and are updated periodically
  • 30.
    MADRID · NOV18-19 · 2016 #Aggregation Framework 30
  • 31.
    MADRID · NOV18-19 · 2016 #Aggregation Framework 31 ● Use $indexStats to determine how frequently your index is used ● Each shard has its own statistics ● Stats are reset when you restart your mongod service >db.collection.aggregate([ { $indexStats : {} } ])
  • 32.
    MADRID · NOV18-19 · 2016 #Aggregation Framework example 32 > db.test.aggregate([ { $indexStats : {} } ]).pretty() { "name" : "b_1", "key" : { "b" : 1 }, "host" : "hostname:27017", "accesses" : { "ops" : NumberLong(4), "since" : ISODate("2016-11-07T00:05:39.771Z") } } ...
  • 33.
    MADRID · NOV18-19 · 2016 #Tools-Compass 33
  • 34.
    MADRID · NOV18-19 · 2016 #Tools-Compass 34 ● MongoDB Compass ○ Analyze the size and usage of the indexes ○ It offers visual plan explanations ○ Index creation ○ Real time stats ○ Identify and fix performance and data problems
  • 35.
    MADRID · NOV18-19 · 2016 #Tools-Ops Manager 35 ● The Visual Query Profiler allows to analyze a query performance and recommends on the addition of indexes ● It can be used to visualize output from the profiler when identifying slow queries
  • 36.
    MADRID · NOV18-19 · 2016 #Tools-Cloud Manager & Atlas 36 ● Use the Query Targeting chart
  • 37.
    MADRID · NOV18-19 · 2016 #Tools-Profiling 37 ● Guess the slowest queries for improving their indexes ● Log information for all events or only those whose duration exceeds a configurable threshold (100ms default)
  • 38.
    MADRID · NOV18-19 · 2016 #Tools-Profiling example 38 > db.setProfilingLevel(2) { "was" : 0, "slowms" : 100, "ok" : 1 } > db.example.find( { c : 98 } ); > db.system.profile.find().pretty() { "op" : "query", "ns" : "test.example", "query" : { "find" : "example", "filter" : { "c" : 98 } }, … "millis" : 6
  • 39.
    MADRID · NOV18-19 · 2016 #Tuning-SSD 39 ● Most disk access patterns are not sequential ● Data files benefit from SSDs ● MongoDB journal files have high sequential write patterns (good for spinning disks) ● RAM/SSDs better than spinning drives ● Use high performance storage ● Avoid networked storage
  • 40.
    MADRID · NOV18-19 · 2016 #Tuning-CPU 40 ● MongoDB has better performance on faster CPUs ● The MongoDB WiredTiger Storage Engine is better able to saturate multi-core processor resources than the MMAPv1 Storage Engine
  • 41.
    MADRID · NOV18-19 · 2016 #Tuning-Thread 41 ● The WiredTiger Storage Engine is multi-threaded and can take advantage of many CPU Cores
  • 42.
    MADRID · NOV18-19 · 2016 #Sharding 42 ● The balancer has been moved from the mongos to the primary member of the config server ● Sharding improves the performance of reads ● Benefits (even with a ‘poor’ shard key and scatter-gather): ○ A larger amount of memory to store indexes ○ Parallelize queries across shards reducing latency
  • 43.
    MADRID · NOV18-19 · 2016 #Storage & Compression types 43 ● Balance query latency with storage density and cost by assigning data sets based on a value such as a timestamp to specific storage devices
  • 44.
    MADRID · NOV18-19 · 2016 #Storage & Compression types 44 ● Recent, frequently accessed data can be assigned to high performance SSDs with Snappy compression ● Older, less frequently accessed data is tagged to lower-throughput hard disk drives compressed with zlib to attain maximum storage density with a lower cost-per-bit ● As data ages, MongoDB will automatically migrates it between shards
  • 45.
    MADRID · NOV18-19 · 2016 #Questions? 45 Questions?
  • 46.
    MongoDB: Improve the performanceof your application by @juanroycouto MADRID · NOV 18-19 · 2016 46 Thank you for your attention!