Quick Overview on MongoDB Eman Abdel Ghaffar
Agenda 1. Introduction 2. CRUD 3. Cursors 4. Indexing 5. Schema Design principles 6. Aggregation 7. Map-Reduce
Introduction - ACID ● Relational databases usually guarantee ACID properties related to how reliably transactions (both reads and writes) are processed. ● The NoSQL movement trades off ACID compliance for other properties, such as 100% availability, and MongoDB is the leader in the field ● https://dzone.com/articles/how-acid-mongodb
Introduction - ACID ● Atomicity requires that each transaction is executed in its entirety, or fail without any change being applied. ● Consistency requires that the database only passes from a valid state to the next one, without intermediate points. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers. ● Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. ● Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss.
Introduction - CAP ● Consistency Every read receives the most recent write or an error. ● Availability Every request receives a (non-error) response – without guarantee that it contains the most recent write. ● Partition tolerance The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. “It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees”
Introduction - MongoDB ● MongoDB is written in C++, open source and licensed under the GNU - AGPL . ● The core database server runs via an executable called mongod ( mongodb.exe on Windows) ● The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. manual/reference/mongo-shell/
CRUD - Create ● Databases and collections are created only when documents are first inserted.. ● Every MongoDB document requires an _id. db.collection.insertOne() db.collection.insertMany() db.collection.insert()
CRUD - Read db.collection.find(query, projection) db.inventory.find( {} ) SELECT * FROM inventory db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D" db.inventory.find( { status: { $in: [ "A", "D" ] } } ) SELECT * FROM inventory WHERE status in ("A", "D") db.inventory.find( { status: "A", qty: { $lt: 30 } } ) SELECT * FROM inventory WHERE status = "A" AND qty < 30 db.inventory.find( { status: "A", $or: [ { qty: { $lt: 30 } }, { item: /^p/ } ] } ) SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%")
CRUD - Update ● Some Update Operators ○ $currentDate ○ $inc ○ $min ○ $max ○ $mul ○ $rename ○ $set db.collection.update() db.collection.findAndModify() db.collection.updateOne() db.collection.updateMany() db.collection.replaceOne()
CRUD - Delete ● Indexes ○ Delete operations do not drop indexes, even if deleting all documents from a collection. ● Atomicity ○ All write operations in MongoDB are atomic on the level of a single document. db.collection.remove() db.collection.deleteOne() db.collection.deleteMany()
Cursors ● Cursors, found in many database systems, return query result sets in batches for efficiency iteratively. ● Queries instantiate a cursor, which is then used to retrieve a resultset in manageable chunks, successive calls to MongoDB occur as needed to fill the driver’s cursor buffer. ● Returning a huge result right away would mean: ○ Copying all that data into memory. ○ Transferring it over the wire. ○ Deserializing it on the client side.
Indexing ● Introduction ● Indexing Types ● Indexing Properties
Indexing- Introduction ● Index keys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk. ● Covered Queries ○ When the query criteria and the projection of a query include only the indexed fields ○ Results returned directly from the index without scanning any documents or bringing documents into memory. ● Ensure Indexes Fit in RAM ○ use the db.collection.totalIndexSize() helper, which returns index size in bytes.
Indexing - Index Types ● Single Field ● Compound Index ● Multikey Index ● Geospatial Index ● Text Indexes ● Hashed Indexes
Indexing - Index Properties ● TTL Indexes ○ The TTL index is used for TTL collections, which expire data after a period of time. ● Unique Indexes ○ A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. ● Partial Indexes ○ A partial index indexes only documents that meet specified filter criteria. ● Case Insensitive Indexes ○ A case insensitive index disregards the case of the index key values. ● Sparse Indexes ○ A sparse index does not index documents that do not have the indexed field.
Schema Design principles ● Introduction ● Embedding Vs. Referencing ● Model One-to-One Relationships ● Model One-to-Many Relationships
Schema Design principles - Introduction ● The application’s data access patterns should govern schema design, with specific understanding of ○ The read/write ratio of database operations. ○ The types of queries and updates performed by the database. ○ The life-cycle of the data and growth rate of documents. ● When designing a data model, consider how applications will use your database. ○ if your application only uses recently inserted documents, consider using Capped Collections data-modeling
Embedding Vs. Refencing
Embedding Vs. Refencing ● Embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. ● Not all 1:1 or 1:Many relationships should be embedded in a single document.
Embedding Vs. Refencing ● References store the relationships between data by including links or references from one document to another. ○ When embedding would not provide sufficient read performance advantages ○ Where the object is referenced from many different sources. ○ To represent complex many-to-many relationships. ○ To model large, hierarchical data sets.
One-to-One Relationships - Embedding
One-to-Many Relationships One-to-ManyOne-to-Few
One-to-Many Relationships One-to-Squillions
Aggregation
Aggregation ● Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. ● The aggregate command operates on a single collection, logically passing the entire collection into the aggregation pipeline. ● The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
Aggregation https://docs.mongodb.com/manual/core/aggregation-pipeline-optimization/
Aggregation - Limitations ● If any single document that exceeds the BSON Document Size limit, the command will produce an error. ● The $group stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $group will produce an error.
Map-Reduce ● Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. ● Map-Reduce is less efficient and more complex than the aggregation pipeline. ● All map-reduce functions in MongoDB are JavaScript and run within the mongod process. ● Map-reduce operations take the documents of a single collection.
Questions

Quick overview on mongo db

  • 1.
  • 2.
    Agenda 1. Introduction 2. CRUD 3.Cursors 4. Indexing 5. Schema Design principles 6. Aggregation 7. Map-Reduce
  • 3.
    Introduction - ACID ●Relational databases usually guarantee ACID properties related to how reliably transactions (both reads and writes) are processed. ● The NoSQL movement trades off ACID compliance for other properties, such as 100% availability, and MongoDB is the leader in the field ● https://dzone.com/articles/how-acid-mongodb
  • 4.
    Introduction - ACID ●Atomicity requires that each transaction is executed in its entirety, or fail without any change being applied. ● Consistency requires that the database only passes from a valid state to the next one, without intermediate points. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers. ● Isolation requires that if transactions are executed concurrently, the result is equivalent to their serial execution. ● Durability means that the the result of a committed transaction is permanent, even if the database crashes immediately or in the event of a power loss.
  • 5.
    Introduction - CAP ●Consistency Every read receives the most recent write or an error. ● Availability Every request receives a (non-error) response – without guarantee that it contains the most recent write. ● Partition tolerance The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes. “It is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees”
  • 6.
    Introduction - MongoDB ●MongoDB is written in C++, open source and licensed under the GNU - AGPL . ● The core database server runs via an executable called mongod ( mongodb.exe on Windows) ● The MongoDB command shell is a JavaScript-based tool for administering the database and manipulating data. manual/reference/mongo-shell/
  • 7.
    CRUD - Create ●Databases and collections are created only when documents are first inserted.. ● Every MongoDB document requires an _id. db.collection.insertOne() db.collection.insertMany() db.collection.insert()
  • 8.
    CRUD - Read db.collection.find(query,projection) db.inventory.find( {} ) SELECT * FROM inventory db.inventory.find( { status: "D" } ) SELECT * FROM inventory WHERE status = "D" db.inventory.find( { status: { $in: [ "A", "D" ] } } ) SELECT * FROM inventory WHERE status in ("A", "D") db.inventory.find( { status: "A", qty: { $lt: 30 } } ) SELECT * FROM inventory WHERE status = "A" AND qty < 30 db.inventory.find( { status: "A", $or: [ { qty: { $lt: 30 } }, { item: /^p/ } ] } ) SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%")
  • 9.
    CRUD - Update ●Some Update Operators ○ $currentDate ○ $inc ○ $min ○ $max ○ $mul ○ $rename ○ $set db.collection.update() db.collection.findAndModify() db.collection.updateOne() db.collection.updateMany() db.collection.replaceOne()
  • 10.
    CRUD - Delete ●Indexes ○ Delete operations do not drop indexes, even if deleting all documents from a collection. ● Atomicity ○ All write operations in MongoDB are atomic on the level of a single document. db.collection.remove() db.collection.deleteOne() db.collection.deleteMany()
  • 11.
    Cursors ● Cursors, foundin many database systems, return query result sets in batches for efficiency iteratively. ● Queries instantiate a cursor, which is then used to retrieve a resultset in manageable chunks, successive calls to MongoDB occur as needed to fill the driver’s cursor buffer. ● Returning a huge result right away would mean: ○ Copying all that data into memory. ○ Transferring it over the wire. ○ Deserializing it on the client side.
  • 12.
    Indexing ● Introduction ● IndexingTypes ● Indexing Properties
  • 13.
    Indexing- Introduction ● Indexkeys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk. ● Covered Queries ○ When the query criteria and the projection of a query include only the indexed fields ○ Results returned directly from the index without scanning any documents or bringing documents into memory. ● Ensure Indexes Fit in RAM ○ use the db.collection.totalIndexSize() helper, which returns index size in bytes.
  • 14.
    Indexing - IndexTypes ● Single Field ● Compound Index ● Multikey Index ● Geospatial Index ● Text Indexes ● Hashed Indexes
  • 15.
    Indexing - IndexProperties ● TTL Indexes ○ The TTL index is used for TTL collections, which expire data after a period of time. ● Unique Indexes ○ A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. ● Partial Indexes ○ A partial index indexes only documents that meet specified filter criteria. ● Case Insensitive Indexes ○ A case insensitive index disregards the case of the index key values. ● Sparse Indexes ○ A sparse index does not index documents that do not have the indexed field.
  • 16.
    Schema Design principles ●Introduction ● Embedding Vs. Referencing ● Model One-to-One Relationships ● Model One-to-Many Relationships
  • 17.
    Schema Design principles- Introduction ● The application’s data access patterns should govern schema design, with specific understanding of ○ The read/write ratio of database operations. ○ The types of queries and updates performed by the database. ○ The life-cycle of the data and growth rate of documents. ● When designing a data model, consider how applications will use your database. ○ if your application only uses recently inserted documents, consider using Capped Collections data-modeling
  • 18.
  • 19.
    Embedding Vs. Refencing ●Embedding provides better performance for read operations, as well as the ability to request and retrieve related data in a single database operation. ● Not all 1:1 or 1:Many relationships should be embedded in a single document.
  • 20.
    Embedding Vs. Refencing ●References store the relationships between data by including links or references from one document to another. ○ When embedding would not provide sufficient read performance advantages ○ Where the object is referenced from many different sources. ○ To represent complex many-to-many relationships. ○ To model large, hierarchical data sets.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
    Aggregation ● Aggregation operationsgroup values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. ● The aggregate command operates on a single collection, logically passing the entire collection into the aggregation pipeline. ● The $match and $sort pipeline operators can take advantage of an index when they occur at the beginning of the pipeline.
  • 27.
  • 28.
    Aggregation - Limitations ●If any single document that exceeds the BSON Document Size limit, the command will produce an error. ● The $group stage has a limit of 100 megabytes of RAM. By default, if the stage exceeds this limit, $group will produce an error.
  • 29.
    Map-Reduce ● Map-reduce isa data processing paradigm for condensing large volumes of data into useful aggregated results. ● Map-Reduce is less efficient and more complex than the aggregation pipeline. ● All map-reduce functions in MongoDB are JavaScript and run within the mongod process. ● Map-reduce operations take the documents of a single collection.
  • 31.