Shop spring savings on bulk
Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows.
Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

  • Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Follow the author

Get new release updates & improved recommendations
Something went wrong. Please try your request again later.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems 1st Edition


{"desktop_buybox_group_1":[{"displayPrice":"$59.99","priceAmount":59.99,"currencySymbol":"$","integerValue":"59","decimalSeparator":".","fractionalValue":"99","symbolPosition":"left","hasSpace":false,"showFractionalPartIfEmpty":true,"offerListingId":"II41GBO7hWpWKaZw3MkTG0ZWfZugp%2F7IWSNbANr6Wlp4gme9F9zXLdmS9ji%2BzgxLzlu9fW4EKldArJgi1z60Bp8f4axBmaSh1mgzDbSVQ5bINE4OKMILSQO66R0M9ZN7WALe%2BcpolHsMFCCuAf%2BWpg%3D%3D","locale":"en-US","buyingOptionType":"NEW","aapiBuyingOptionIndex":0}]}

Purchase options and add-ons

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?

In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.

  • Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
  • Make informed decisions by identifying the strengths and weaknesses of different tools
  • Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
  • Understand the distributed systems research upon which modern databases are built
  • Peek behind the scenes of major online services, and learn from their architectures

There is a newer edition of this item:

Frequently bought together

This item: Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
$59.99
Get it as soon as Wednesday, Mar 25
In Stock
Ships from and sold by Amazon.com.
+
$39.99
Get it as soon as Wednesday, Mar 25
In Stock
Ships from and sold by Amazon.com.
+
$40.00
Get it as soon as Wednesday, Mar 25
In Stock
Ships from and sold by Amazon.com.
Total price: $00
To see our price, add these items to your cart.
Details
Added to Cart
Choose items to buy together.

Customers also bought or read

Loading...

From the brand


From the Publisher

Designing Data-Intensive Applications

Who Should Read This Book?

If you develop applications that have some kind of server/backend for storing or processing data, and your applications use the internet (e.g., web applications, mobile apps, or internet-connected sensors), then this book is for you.

This book is for software engineers, software architects, and technical managers who love to code. It is especially relevant if you need to make decisions about the architecture of the systems you work on—for example, if you need to choose tools for solving a given problem and figure out how best to apply them. But even if you have no choice over your tools, this book will help you better understand their strengths and weaknesses.

You should have some experience building web-based applications or network services, and you should be familiar with relational databases and SQL. Any non-relational databases and other data-related tools you know are nice, but not required.

A general understanding of common network protocols like TCP and HTTP is helpful. Your choice of programming language or framework makes no difference for this book.

If any of the following are true for you, you’ll find this book valuable:

  • You want to learn how to make data systems scalable, for example, to support web or mobile apps with millions of users.
  • You need to make applications highly available (minimizing downtime) and operationally robust.
  • You are looking for ways of making systems easier to maintain in the long run, even as they grow and as requirements and technologies change.
  • You have a natural curiosity for the way things work and want to know what goes on inside major websites and online services. This book breaks down the internals of various databases and data processing systems, and it’s great fun to explore the bright thinking that went into their design.

Designing Data-Intensive Applications

Sometimes, when discussing scalable data systems, people make comments along the lines of, 'You’re not Google or Amazon. Stop worrying about scale and just use a relational database'. There is truth in that statement: building for scale that you don’t need is wasted effort and may lock you into an inflexible design. In effect, it is a form of premature optimization. However, it’s also important to choose the right tool for the job, and different technologies each have their own strengths and weaknesses. As we shall see, relational databases are important but not the final word on dealing with data.

Scope of This Book

This book does not attempt to give detailed instructions on how to install or use specific software packages or APIs, since there is already plenty of documentation for those things. Instead we discuss the various principles and trade-offs that are fundamental to data systems, and we explore the different design decisions taken by different products.

We look primarily at the architecture of data systems and the ways they are integrated into data-intensive applications. This book doesn’t have space to cover deployment, operations, security, management, and other areas—those are complex and important topics, and we wouldn’t do them justice by making them superficial side notes in this book. They deserve books of their own.

Many of the technologies described in this book fall within the realm of the Big Data buzzword. However, the term 'Big Data' is so overused and underdefined that it is not useful in a serious engineering discussion. This book uses less ambiguous terms, such as single-node versus distributed systems, or online/interactive versus offline/batch processing systems.

This book has a bias toward free and open source software (FOSS), because reading, modifying, and executing source code is a great way to understand how something works in detail. Open platforms also reduce the risk of vendor lock-in. However, where appropriate, we also discuss proprietary software (closed-source software, software as a service, or companies’ in-house software that is only described in literature but not released publicly).

Editorial Reviews

About the Author

Martin is a researcher in distributed systems at the University of Cambridge. Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. In the process he learned a few things the hard way, and he hopes this book will save you from repeating the same mistakes.



Martin is a regular conference speaker, blogger, and open source contributor. He believes that profound technical ideas should be accessible to everyone, and that deeper understanding will help us develop better software.

Product details

About the author

Follow authors to get new release updates, plus improved recommendations.
Martin Kleppmann
Brief content visible, double tap to read full content.
Full content visible, double tap to read brief content.

Martin Kleppmann is a researcher in distributed systems and security at the University of Cambridge, and author of Designing Data-Intensive Applications (O'Reilly Media, 2017). Previously he was a software engineer and entrepreneur at Internet companies including LinkedIn and Rapportive, where he worked on large-scale data infrastructure. He is now working on TRVE DATA, a project that aims to bring end-to-end encryption and decentralisation to a wide range of applications.

Customer reviews

4.8 out of 5 stars
5,495 global ratings

Customers say

Customers find this book provides a thorough theoretical overview of data-intensive systems, with detailed explanations of modern techniques and real-world big data architecture. Moreover, the book is well-structured, serves as an incredible resource for software engineers, and helps prepare for system design interviews. Customers appreciate its clarity, with one noting how the author makes complex concepts simple, and its readability, with one mentioning it's readable even on complex topics.
AI Generated from the text of customer reviews

Select to learn more

122 customers mention insight, 120 positive, 2 negative
Customers find the book provides thorough and useful information, examining a long list of data-related paradigms, with one customer noting it masterfully summarizes 30+ years of theory.
...I wish I had started earlier. This book is very comprehensive and informative. The author makes complex concepts look simple....Read more
My engineering manger recommended me this book, it is very insightful I recommend everybody to read itRead more
InsightfulRead more
...A great reference and thought-provoker.Read more
89 customers mention readability, 83 positive, 6 negative
Customers find the book highly readable, particularly for software engineers, systems architects, and developers, with one customer noting it explains complex topics clearly.
...whip out the big guns right off the bat - this is probably the best technical book that I've ever read and it's very likely I'll be returning to it...Read more
...This is one of the best technical books that I've encountered, and I go back to it again and again....Read more
The book is very readable. The author has done an excellent job of unfolding the complexity layer by layer with great real-life examples....Read more
...This book is a great resource, it’s laid out well and easy to read. The last section is weaker than the rest, but it’s still worth buying.Read more
73 customers mention depth, 67 positive, 6 negative
Customers appreciate the depth of the book, which provides an amazing overview and goes into extreme detail, particularly in explaining real-world big data architecture.
A great overview of what’s going on in system design today....Read more
An exceptional book. Comprehensive, unusually coherent and readable even on complex topics. In ~500 pages there I only caught ~1-2 editing errors....Read more
Very detailed but also handles everything from a high level perspective which makes it easy for a developer to implement....Read more
Painstakingly researched and well documented book. One stop for lot of innovation and research in current distributed data.Read more
46 customers mention use, 45 positive, 1 negative
Customers find the book practical and useful, particularly as a resource for software engineers and system design interviews, with one customer noting it provides real-world examples for everything.
must-have for software engineersRead more
This is a very well written and practical book covering data intensive application as the title claims....Read more
I agree with other similar reviews. This book is a great resource, it’s laid out well and easy to read....Read more
Very good book with narratives and examples. Cannot stop.Read more
39 customers mention design, 37 positive, 2 negative
Customers appreciate the book's design approach, noting its well-thought-out structure and well-organized content, with one customer describing it as a foundational book for system design at scale.
...This book, Designing Data-Intensive Applications, is an eye-opener for anyone who has ever wondered about the internals of these services....Read more
...Book is nicely organized with lots of references and nice definitions.Read more
This book dives deep into its subject with clear structure and thoughtful explanations....Read more
I like this book. It has a lot of well structured information. It filled gaps in my knowledge.Read more
33 customers mention clarity, 29 positive, 4 negative
Customers find the book's concepts clear, with one customer noting how it combines theory and practice, while another mentions how the author makes complex ideas simple.
Ideas are elaborately explained giving real world examples. Concepts are clear and intriguing. Helped me in cracking systems design interviewsRead more
...It will give you a solid understanding of how to choose the right tech for different use cases....Read more
...This book helped me in getting a clear understanding of every topic it has covered.Read more
Really informative, and not too technical....Read more
32 customers mention writing style, 31 positive, 1 negative
Customers appreciate the writing style of the book, describing it as very well written, with one customer noting its simple prose and another highlighting how it serves as a standard for technology writing.
My favorite technical book currently. Very well written and talk about a lots of interesting things that I encounter on daily basis.Read more
...recapping historical designs in the first half, but it is so well written with useable information, that it is an enjoyable read from cover to cover...Read more
...The replacement copy is of much better quality. Such a well-written and well-researched book does not deserve to be printed this way....Read more
The book content is good and very well-written. However, the quality of the paperback book is bad. The pages keep falling out of the binding...Read more
23 customers mention book content, 23 positive, 0 negative
Customers find the book's content good, with one customer highlighting its extensive breadth and another noting its foundational material on distributed systems.
The book content is good and very well-written. However, the quality of the paperback book is bad. The pages keep falling out of the binding...Read more
Excellent content, but paper and ink quality are terrible - at least for my copy....Read more
The content of the book is good. But the way the book was packaged was only satisfactory. The outer cover in which it was wrapped was torn.Read more
Pretty interesting book, lots of useful material, would’ve been better if more details were provided on some topicsRead more
Designing Data-Intensive Applications
5 out of 5 stars
Designing Data-Intensive Applications
Fantastic book for Software Engineers or Designers who want to learn in deep detail about crucial subjects related to critical mission applications. Instead of only add Spring Annotations for Transactional isolation and propagation, now I know under the hood a lot of these and other related aspects which we need to consider the right tool for each scenario.
Thank you for your feedback
Sorry, there was an error
Sorry we couldn't load the review

Top reviews from the United States

  • Reviewed in the United States on June 1, 2020
    Format: PaperbackVerified Purchase
    Designing Data-Intensive Applications really exceeded my expectations. Even if you are experienced in this area this book will re-enforce things you know (or sort of know) and bring to light new ways of thinking about solving distributed systems and data problems. It will give you a solid understanding of how to choose the right tech for different use cases.

    The book really pulls you in with an intro that is more high level, but mentions problems and solutions that really anyone who has worked on these types of applications have either encountered or heard mention of. The promise it makes is to take these issues such as scalability, maintainability and durability and explain how to decide on the right solutions to these issues for the problems you are solving. It does an amazing job of that throughout the book.

    This book covers a lot, but at the same time it knows exactly when to go deep on a subject. Right when it seems like it may be going too deep on things like how different types of databases are implemented (SSTables, B-trees, etc.) or on comparing different consensus algorithms, it is quick to point out how and why those things are important to practical real-world problems and how understanding those things is actually vital to the success of a system.

    Along those same lines it is excellent at circling back to concepts introduced at prior points in the book. For example the book goes into how log based storage is used for some databases as their core way of storing data and for durability in other cases. Later in the book when getting into different message/eventing systems such as Kafka and ActiveMQ things swing back to how these systems utilize log based storage in similar ways. Even if you have prior knowledge or even have worked with these technologies, how and why they work and the pros and cons of each become crystal clear and really solidified. Same can be said of it's great explanations of things like ZooKeeper and why specific solutions like Kafka make use of it.

    This book is also amazing at shedding light on the fact that so little of what is out there is totally new, it attempts to go back as far as it can at times on where a certain technology's ideas originated (back to the 1800s at some points!). Bringing in this history really gives a lot of context around the original problems that were being solved, which in turn helps understanding pros and cons. One example is the way it goes through the history of batch processing systems and HDFS. The author starts with MapReduce and relating it to tech that was developed decades before. This really clarifies how we got from batch processing systems on proprietary hardware to things like MapReduce on commodity hardware thanks in part to HDFS, eventually to stream based processing. It also does great at explaining the pros and cons of each and when one might choose one technology over the other.

    That's really the theme of this book, teaching the reader how to compare and contrast different technologies for solving distributed systems and data problems. It teaches you to read between the lines on how certain technologies work so that you can identify the pros and cons early and without needing them to be spelled out by the authors of those technologies. When thinking about databases it teaches you to really consider the durability/scalability model and how things are no where near black and white between "consistent" vs "eventually consistent", these is a ton of nuance there and it goes deep on things like single vs multi leader vs leaderless, linearizability, total order broadcast, and different consensus algorithms.

    I could go on forever about this book. To name a few other things it touches on to get a good idea of the breadth here: networking (and networking faults), OLAP, OLTP, 2 phase locking, graph databases, 2 phase commit, data encoding, general fault tolerance, compatibility, message passing, everything I mentioned above, and the list goes on and on and on. I recommend anyone who does any kind of work with these systems takes the time to read this book. All 600ish pages are worth reading, and it's presented in an excellent, engaging way with real world practical examples for everything.
    41 people found this helpful
    Report
  • Reviewed in the United States on September 17, 2025
    Format: PaperbackVerified Purchase
    In today’s world, many of us have been tasked with building reliable, scalable services. Yet, more often than not, we rely on existing abstractions without fully understanding how the underlying systems work. Need a scalable database? Use MongoDB. Need a streaming service? Kafka is your go-to. While these tools get the job done, they often serve as crutches that prevent us from delving into the complexities of distributed systems. This book, Designing Data-Intensive Applications, is an eye-opener for anyone who has ever wondered about the internals of these services. It takes a deep dive into key concepts like consistency, exploring the critical differences between strong and weak consistency, and the trade-offs that come with each approach. For example, when a master node fails, how does a new master get elected? The book explains this process in depth, shedding light on the mechanics of fault tolerance. The book also provides clarity on how databases store and retrieve data efficiently. If you’ve ever come across PostgreSQL’s documentation and wondered, "What exactly is a B-tree?", this book will make it crystal clear.

    It also goes into the common gotchas when working with transactions. You might think that using transactions makes you safe from concurrency issues, but that’s not always the case. The book explains why this happens and offers practical advice on how to avoid race conditions.

    What this book isn’t: If you’re a practitioner building distributed systems from scratch and looking for in-depth explanations of algorithms like Raft, Paxos, or other low-level details, this book might not be what you’re looking for. It serves more as a high-level introduction to distributed systems rather than a deep dive into the specifics of consensus algorithms. For those looking for more detailed, foundational material on distributed systems, I’d recommend checking out Tanenbaum’s Distributed Systems.
    4 people found this helpful
    Report
  • Reviewed in the United States on June 3, 2025
    This book dives deep into its subject with clear structure and thoughtful explanations. The concepts are well-articulated and build on each other logically. However, to truly appreciate the depth and get the most out of it, I recommend reading it with some prior experience or familiarity with the topic. Overall, a highly valuable and rewarding read.
    3 people found this helpful
    Report

Top reviews from other countries

  • Nikola Zifra
    3.0 out of 5 stars Lacks details
    Reviewed in the United Arab Emirates on September 18, 2024
    Format: PaperbackVerified Purchase
    This book provides a high level overview but unfortunatly lacks quite a bit of detail
  • Joachim O.
    5.0 out of 5 stars Great in-depth analysis of data architectures
    Reviewed in the United Kingdom on November 17, 2024
    This book covers pretty much all topics which are relevant to managing databases or designing data models in more than 800 pages. It also provides detailed information about the inner workings of databases to the degree that you might be able to implement your own simple database.

    The book is very well didactically structured which is no surprise given that the author is a professor at Cambridge. For example, it explains batch processing algorithms (e.g. Map Reduce) and uses this as basis to delve into data streaming. Strong emphasis is laid on the problems with regards to distributed computing (replication, partitioning, node failures, etc.) and the discussion of the compromises one must make.

    Overall, an easy recommendation for anyone is interested in data architectures and the inner workings of databases which are the backbone of pretty much any application in today’s world.
  • Vladyslav
    5.0 out of 5 stars Great book if you want to get into systems design
    Reviewed in Poland on February 7, 2026
    Format: PaperbackVerified Purchase
    The book's topics vary in complexity, and everyone can find fascinating insights for themselves.
  • Andrea
    5.0 out of 5 stars Uno dei più bei libri tecnici che abbia mai letto
    Reviewed in Italy on June 9, 2022
    Format: PaperbackVerified Purchase
    Se siete IT appassionati del vostro lavoro e volete capire cosa c'è sotto le cose che usate quotidianamente, questo è un libro da non perdere. Non è un manuale, non è una guida né un tutorial, ma fa fede al sottotitolo: è un "viaggio" nello scibile sulla gestione computerizzata di dati, che aiuta a comprendere al di là del marketing gli strumenti che abbiamo a disposizione.

    Il libro è densissimo (come dimostra un bell'indice analitico di 30 pagine su un totale di quasi 600), ricco di riferimenti (come dimostrano le folte bibliografie al termine di ogni capitolo, per lo più risorse online) ed è evidente il background accademico dell'autore. E' un libro che richiede tempo nella lettura e comprensione - se non si saltano i dettagli, si intende... ma nel caso lasciate perdere.

    Una buona metà del libro riguarda la modifica concorrente di dati e i sistemi distribuiti, la parte più terrificante e affascinante, dove vengono minuziosamente spiegati i problemi che presentano e gli algoritmi che li risolvono (ad esclusione dei problemi "bizantini"). Ho trovato ..."confortante" l'analisi dell'acronimo ACID :)

    Chiude con un'analisi di ciò che l'autore si aspetta per il futuro; molto interessante il concetto di "unbundling" dei database.
  • Amazon Customer
    5.0 out of 5 stars Insightful read on data management
    Reviewed in Singapore on January 7, 2026
    Format: PaperbackVerified Purchase
    The book provides solid insights into managing data, particularly at scale. It helped clarify concepts I'd been grappling with and offered practical perspectives that go beyond surface-level explanations. Would recommend for anyone looking to deepen their understanding of large-scale data systems.
    Customer image
    Amazon Customer
    5.0 out of 5 stars
    Insightful read on data management

    Reviewed in Singapore on January 7, 2026
    The book provides solid insights into managing data, particularly at scale. It helped clarify concepts I'd been grappling with and offered practical perspectives that go beyond surface-level explanations. Would recommend for anyone looking to deepen their understanding of large-scale data systems.
    Images in this review
    Customer image