Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the author
OK
Database Internals: A Deep Dive into How Distributed Data Systems Work 1st Edition
Purchase options and add-ons
When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.
Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.
This book examines:
- Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each
- Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log
- Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns
- Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency
- ISBN-101492040347
- ISBN-13978-1492040347
- Edition1st
- PublisherO'Reilly Media
- Publication dateNovember 5, 2019
- LanguageEnglish
- Dimensions7 x 0.75 x 9 inches
- Print length370 pages
Frequently bought together

Explore more items
A Philosophy of Software Design, 2nd EditionPaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Thursday, Apr 16
AI Engineering: Building Applications with Foundation ModelsPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Apr 16
Systems Performance (Addison-Wesley Professional Computing Series)PaperbackFREE Shipping by AmazonGet it as soon as Thursday, Apr 16
The Staff Engineer's Path: A Guide for Individual Contributors Navigating Growth and ChangePaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Thursday, Apr 16
Hands-On Large Language Models: Language Understanding and GenerationPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Apr 16
Team Topologies, 2nd Edition: Organizing Business and Technology for Fast Flow of ValuePaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Thursday, Apr 16
Customers also bought or read
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems#1 Best SellerMySQL Guides
Paperback$59.99$59.99FREE delivery Thu, Apr 16 - Systems Performance (Addison-Wesley Professional Computing Series)#1 Best SellerComputer Performance Optimization
Paperback$65.95$65.95FREE delivery Thu, Apr 16 - Operating Systems: Three Easy Pieces#1 Best SellerComputer Operating Systems Theory
Paperback$28.27$28.27Delivery Thu, Apr 16 - Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
Paperback$48.89$48.89FREE delivery Thu, Apr 16 - Site Reliability Engineering: How Google Runs Production Systems
Paperback$27.89$27.89Delivery Thu, Apr 16 - Building Microservices: Designing Fine-Grained Systems
Paperback$44.49$44.49FREE delivery Thu, Apr 16 - Designing Distributed Systems: Patterns and Paradigms for Scalable, Reliable Systems Using Kubernetes
Paperback$45.76$45.76FREE delivery Thu, Apr 16 - Understanding Distributed Systems, Second Edition: What every developer should know about large distributed applications
Paperback$35.00$35.00FREE delivery Thu, Apr 16 - Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Paperback$43.99$43.99FREE delivery Thu, Apr 16 - Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
Paperback$43.99$43.99FREE delivery Thu, Apr 16 - The Staff Engineer's Path: A Guide for Individual Contributors Navigating Growth and Change
Paperback$26.39$26.39Delivery Thu, Apr 16 - System Design Interview – An Insider's Guide: Volume 2
Paperback$40.00$40.00FREE delivery Thu, Apr 16 - Patterns of Distributed Systems (Addison-Wesley Signature Series (Fowler))
Paperback$35.44$35.44FREE delivery Thu, Apr 16 - Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
Paperback$40.00$40.00FREE delivery Thu, Apr 16 - Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance, and Scalability on the Data Lake
Paperback$44.94$44.94FREE delivery Thu, Apr 16 - Data Engineering Design Patterns: Recipes for Solving the Most Common Data Engineering Problems
Paperback$57.96$57.96FREE delivery Thu, Apr 16 - System Design Interview – An insider's guide#1 Best SellerCloud Computing
Paperback$39.99$39.99FREE delivery Thu, Apr 16 - Concurrency in Go: Tools and Techniques for Developers
Paperback$46.76$46.76FREE delivery Thu, Apr 16 - Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems
Paperback$47.29$47.29FREE delivery Thu, Apr 16 - Cloud Application Architecture Patterns: Designing, Building, and Modernizing for the Cloud
Paperback$12.21$12.21Delivery Thu, Apr 16 - Foundations of Scalable Systems: Designing Distributed Architectures
Paperback$42.49$42.49FREE delivery Thu, Apr 16 - Database Design for Mere Mortals: 25th Anniversary Edition#1 Best SellerMicrosoft SQL Server
Paperback$36.43$36.43FREE delivery Thu, Apr 16 - Data Pipelines Pocket Reference: Moving and Processing Data for Analytics
Paperback$16.93$16.93Delivery Thu, Apr 16
From the brand
-
Databases, data science & more
-
Data Science
-
Data Visualization
-
Databases
-
Streaming
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
From the Preface
Who is this book for?
In conversations at technical conferences, I often hear the same question: “How can I learn more about database internals? I don’t even know where to start.” Most of the books on database systems do not go into details of storage engine implementation, and cover the access methods, such as B-Trees, on a rather high level. There are very few books that cover more recent concepts, such as different B-Tree variants and log-structured storage, so I usually recommend reading papers.
Everyone who reads papers knows that it’s not that easy: you often lack context, the wording might be ambiguous, there’s little or no connection between papers, and they’re hard to find. This book contains concise summaries of important database systems concepts and can serve as a guide for those who’d like to dig in deeper, or as a cheat sheet for those already familiar with these concepts.
Not everyone wants to become a database developer, but this book will help people who build software that uses database systems: software developers, reliability engineers, architects, and engineering managers.
If your company depends on any infrastructure component, be it a database, a messaging queue, a container platform, or a task scheduler, you have to read the project change-logs and mailing lists to stay in touch with the community and be up-to-date with the most recent happenings in the project.
Understanding terminology and knowing what’s inside will enable you to yield more information from these sources and use your tools more productively to troubleshoot, identify, and avoid potential risks and bottlenecks. Having an overview and a general understanding of how database systems work will help in case something goes wrong. Using this knowledge, you’ll be able to form a hypothesis, validate it, find the root cause, and present it to other project maintainers.
This book is also for curious minds: for the people who like learning things without immediate necessity, those who spend their free time hacking on something fun, creating compilers, writing homegrown operating systems, text editors, computer games, learning programming languages, and absorbing new information.
The reader is assumed to have some experience with developing backend systems and working with database systems as a user. Having some prior knowledge of different data structures will help to digest material faster.
Editorial Reviews
About the Author
Product details
- Publisher : O'Reilly Media
- Publication date : November 5, 2019
- Edition : 1st
- Language : English
- Print length : 370 pages
- ISBN-10 : 1492040347
- ISBN-13 : 978-1492040347
- Item Weight : 2.31 pounds
- Dimensions : 7 x 0.75 x 9 inches
- Best Sellers Rank: #45,405 in Books (See Top 100 in Books)
- #5 in Data Warehousing (Books)
- #11 in Data Mining (Books)
- #12 in Data Processing
- Customer Reviews:
About the author

Alex is a data infrastructure engineer, database and storage systems enthusiast, Apache Cassandra committer and a PMC member. His expertise is in storage, distributed systems, and algorithms.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Generated from the text of customer reviewsSelect to learn more
Reviews with images
One of the Best Books out there
Top reviews from the United States
There was a problem filtering reviews. Please reload the page.
- Reviewed in the United States on November 17, 2020Format: KindleVerified PurchaseCan't believe I forgot to write a review for this one!
Partly it's probably because I usually have less to say (or more precisely it's harder for me to be properly articulate) about things I like than I do about the ones I don't. And boy did I like Database Internals! I'll try my best to explain why, the book and the author surely deserve it.
Being a back-end engineer, the main reason for picking this one up was to better understand the distributed databases that I may end up in (or have already had) contact with. With that in mind, I planned on just skimming the first part of the book but imagine my surprise when I found myself Googling BW and LSM trees and going through papers comparing this and that algorithm and their impacts on memory, storage and CPU caches in multicore systems. The geek got suckered in! With my curiosity circuits pleasantly warmed by the first part, I moved on to the second part of the book - the main dish - where a similar scenario unfolded: again I swallowed up whatever was served and ended up digging for more and adding scores of books and papers to my to-read list.
All in all, Database Internals reads felt a lot like a trip to the zoo or a local museum: chock full of data structures and algorithms used by modern-day databases (and distributed systems in general), the book will showcase each item with sufficient details for you to grasp what they're about and then provide you with enough bibliography and reference material to last you a lifetime... or at least a couple of years.
- Reviewed in the United States on May 19, 2024Format: PaperbackVerified PurchaseThis book took me a few years to get through. It is much more low-level than something like DDIA. Having worked on database code for the past couple of years, that context was crucial in helping me understand the book. The book is great because it covers all of the important ideas in databases and talks about the tradeoffs of using various algorithms. Succinct, detailed, and quality.
- Reviewed in the United States on November 3, 2019Format: PaperbackVerified PurchaseThis is one of the best texts covering Database internals. Databases are used everyday, and understanding what happens under the hood is daunting task. This book takes a pragmatic approach on the topic, starting with basics and then taking a deeper dive into how the basic data structures and concept come together. IMHO, this book shall appeal to both Database developer's and engineer's who want to understand how databases work. This book is must have to for the engineer's who really want to get into Database development. Otherwise also this book is a must have reference in general. I personally liked the attention to details in the book on what really matter's when writing a real database. The concepts are equally applicable to SQL and NoSQL databases.
5.0 out of 5 starsThis is one of the best texts covering Database internals. Databases are used everyday, and understanding what happens under the hood is daunting task. This book takes a pragmatic approach on the topic, starting with basics and then taking a deeper dive into how the basic data structures and concept come together. IMHO, this book shall appeal to both Database developer's and engineer's who want to understand how databases work. This book is must have to for the engineer's who really want to get into Database development. Otherwise also this book is a must have reference in general. I personally liked the attention to details in the book on what really matter's when writing a real database. The concepts are equally applicable to SQL and NoSQL databases.One of the Best Books out there
Reviewed in the United States on November 3, 2019
Images in this review
- Reviewed in the United States on November 30, 2019Format: PaperbackVerified PurchaseMastery in systems abstraction comes through a philosophical pivot. While an enthusiastic beginner considers successful "use cases", an experienced traveler - through her implicit awareness of futility against entropy - often only considers failure and just tries her best. As more systems, and more of every system, are being dictated by the twin forces of economics and architectural modernism, a much higher percentage of design and development efforts in software should be dedicated to understanding fundamentals (CPU registers, branch prediction etc.) and essential complexities (multi-node consensus, replication failures etc.). This book is a good start.
Database Internals is divided into two parts - the first deals with database storage. Especially good sections put a 9-cell flash-light on how many recent architectures are indeed built to tackle complexity bottom-up. i.e., LSM (log-structured merge) trees nicely complement the "write amplification" of Solid-State Disks. The discussion on the canonical B-tree and its multiple siblings (especially Bw-tree) is very well done. The functional difference between locks and latches would be enlightening even for experienced database practitioners - locks are used to manage transactions, latches to guard the *physical* storage representation.
The second half of the book focusing on distributed systems is more uneven in quality. It is, however, a great start of economized discussion of about 50 "Best Papers" on Leader Election, Failure/Crash detection, Replication and how distributed systems friendly "consensus protocols", rather than atomic ones like 2-phase commit work better. In many ways, distributed systems have veered from monarchy (single, immutable leader deciding everything, including the next leader) to a true republic (leader is still almost omnipotent, but is regularly replaced by the constituents). The comparative analysis of Paxos, ZAB and Raft - with clear sequence diagrams - is very well done.
The quality of writing is good, though could have been helped with more ruthless editing. The area covered is simply too broad, other than the intersect of SSDs and Modern DB architecture which is very deep and very good. Still the book easily deserves at least 4-stars for the enthusiasm and for its good attempt to convey distributed systems pedagogy to general practitioners. Pair it with Martin Kleppmann's "Designing Data Intensive Applications" and Ken Birman's "Guide to Reliable Distributed Systems".
4.0 out of 5 starsMastery in systems abstraction comes through a philosophical pivot. While an enthusiastic beginner considers successful "use cases", an experienced traveler - through her implicit awareness of futility against entropy - often only considers failure and just tries her best. As more systems, and more of every system, are being dictated by the twin forces of economics and architectural modernism, a much higher percentage of design and development efforts in software should be dedicated to understanding fundamentals (CPU registers, branch prediction etc.) and essential complexities (multi-node consensus, replication failures etc.). This book is a good start.Summarized Recent Overview of Storage & Distributed Systems
Reviewed in the United States on November 30, 2019
Database Internals is divided into two parts - the first deals with database storage. Especially good sections put a 9-cell flash-light on how many recent architectures are indeed built to tackle complexity bottom-up. i.e., LSM (log-structured merge) trees nicely complement the "write amplification" of Solid-State Disks. The discussion on the canonical B-tree and its multiple siblings (especially Bw-tree) is very well done. The functional difference between locks and latches would be enlightening even for experienced database practitioners - locks are used to manage transactions, latches to guard the *physical* storage representation.
The second half of the book focusing on distributed systems is more uneven in quality. It is, however, a great start of economized discussion of about 50 "Best Papers" on Leader Election, Failure/Crash detection, Replication and how distributed systems friendly "consensus protocols", rather than atomic ones like 2-phase commit work better. In many ways, distributed systems have veered from monarchy (single, immutable leader deciding everything, including the next leader) to a true republic (leader is still almost omnipotent, but is regularly replaced by the constituents). The comparative analysis of Paxos, ZAB and Raft - with clear sequence diagrams - is very well done.
The quality of writing is good, though could have been helped with more ruthless editing. The area covered is simply too broad, other than the intersect of SSDs and Modern DB architecture which is very deep and very good. Still the book easily deserves at least 4-stars for the enthusiasm and for its good attempt to convey distributed systems pedagogy to general practitioners. Pair it with Martin Kleppmann's "Designing Data Intensive Applications" and Ken Birman's "Guide to Reliable Distributed Systems".
Images in this review
- Reviewed in the United States on April 16, 2021Format: PaperbackVerified PurchaseI've been looking for a book that covers these topics for a long time. Even just working with different databases on a day-to-day basis it's incredibly helpful to understand how components of each database actually work. Furthermore the topics covered in this book span a very wide array of different topics and techniques which are incredibly handy for distributed systems. It's really hard to find this much information in a single book. Usuaully you'd have to know each of the topics you're interested in and buy an entire book on that topic. This book packs a pretty in depth view on several topics related to database systems into one book without needless fluff.
I highly recommend this book not only to people working on distributed data systems, but to anyone working with databases. This is one of my most frequently referenced books I own.
- Reviewed in the United States on December 29, 2024Format: PaperbackVerified PurchaseStrong recommend to anyone from a beginner to an expert
Top reviews from other countries
Vladimir KazanovReviewed in the United Kingdom on February 28, 20205.0 out of 5 stars A very good book with developers already working with databases and database-like systems
Format: PaperbackVerified PurchaseThere are two infinitely big and comparably old topics in software engineering: compilers and databases. Both have traditions and history, both are recognised as deep research topics, with developers and academics working on related problems for decades.
It's really hard to get an overview of the way databases work, given how diverse and, well, *big* they really are. Decades of practical experience don't mean one has a clear understanding of query processing, optimisation, storage subsystems, transaction processing, concurrency control, etc.
Sometimes, just sometimes, mortals get lucky and somebody writes a survey of a subfield, or an extended overview, of relevant problems. Best example I am aware of: the Red Book aka Readings in Database Systems. It's a vast survey of academic work on databases. But it's more of a collection of paper references than a linear reading.
Database Internals also feels a bit like an extended survey: numerous paper references are, no code, mostly conceptual explanations. What stands out is its good linear narration, gradually coming up with definitions and clarifying explanations.
So, what this book is not: introductory text, a textbook, theory-centric volume or practise-centric work.
What this book is: a survey of typical approaches to two major aspects of databases (local storage subsystems and problems of distributed systems). Interested reader will have to follow the references, casual reader will get familiar with terminology and common concepts in a condensed way.
I would (and definitely will) recommend the book to people already working with databases for at least a few years looking for additional insights or an overview of the field.
Clément GrimaultReviewed in Spain on July 2, 20235.0 out of 5 stars Great book
Format: PaperbackVerified PurchaseAmazing book, in my top 3 technical books. I learned a lot, it goes really deep and explains everything very well. I would suggest to have at least a good understanding of database basics before starting though (indexes, distributed systems)
hailizhangReviewed in Canada on August 27, 20255.0 out of 5 stars Great value with true knowledge!
Format: PaperbackVerified PurchaseThe book is really good, mint without any scratches. Love it!
The book is really good, mint without any scratches. Love it!5.0 out of 5 stars
hailizhangGreat value with true knowledge!
Reviewed in Canada on August 27, 2025
Images in this review
Amazon CustomerReviewed in India on February 12, 20265.0 out of 5 stars Good read, and came in well wrapped
I read other reviews, which said "this book is very technical". That made me think, that this book is going to have source code of database internals..
But, these are just theories.
Because, of that i am a little disappointed.
If you are an experienced programmer, then you can read and absorb this books content in single day.
It is a good book, otherwise.
I read other reviews, which said "this book is very technical". That made me think, that this book is going to have source code of database internals..5.0 out of 5 stars
Amazon CustomerGood read, and came in well wrapped
Reviewed in India on February 12, 2026
But, these are just theories.
Because, of that i am a little disappointed.
If you are an experienced programmer, then you can read and absorb this books content in single day.
It is a good book, otherwise.
Images in this review
-
Amazon CustomerReviewed in Germany on December 20, 20255.0 out of 5 stars Alles, was man wissen muss.
Format: PaperbackVerified PurchaseSehr gute Zusammenfassung for essenzielle Wissen.




















