PostgreSQL Scaling And Failover

PostgreSQL High Availability & Scaling John Paulett October 26, 2009

Overview Scaling Overview – Horizontal & Vertical Options High Availability Overview Other Options Suggested Architecture Hardware Discussion 10/26/2009 2

What are we trying to solve? Survive server failure? – Support an uptime SLA (e.g. 99.9999%)? Application scaling? – Support additional application demand 10/26/2009 3

What are we trying to solve? Survive server failure? – Support an uptime SLA (e.g. 99.9999%)? Application scaling? – Support additional application demand → Many options, each optimized for different constraints 10/26/2009 4

How To Scale Horizontal Scaling – “Google” approach – Distribute load across multiple servers – Requires appropriate application architecture Vertical Scaling – “Big Iron” approach – Single, massive machine (lots of fast processors, RAM, & hard drives) 10/26/2009 6

Horizontal DB Scaling Load Balancing – Distribute operations to multiple servers Partitioning – Cut up the data (horizontal) or tables (vertical) and put them on separate servers – aka “sharding” 10/26/2009 7

Basic Problem when Load Balancing Difficult to maintain consistent state between servers (remember ACID), especially when dealing with writes 4 PostgreSQL Load Balancing Methods: – Master-Slave Replication – Statement-Based Replication Middleware – Asynchronous Multimaster Replication – Synchronous Multimaster Replication 10/26/2009 8

Master-Slave Replication Master handles writes, slaves handle reads Asynchronous replication – Possible data loss on master failure Slony-I – Does not automatically propagate schema changes – Does not offer single connection point – Requires separate solution for master failures 10/26/2009 9

Statement-Based Replication Middleware Intercept SQL queries, send writes to all servers, reads to any server Possible issues using random(), CURRENT_TIMESTAMP, & sequences pgpool-II – Connection Pooling, Replication, Load Balancing, Parallel Queries, Failover 10/26/2009 10

Synchronous Multimaster Replication Writes & reads on any server Not implemented in PostgreSQL, but application code can mimic via two-phase commit 10/26/2009 12

Load Balancing Issue Scaling writes breaks down at a certain point 10/26/2009 13

Partitioning Requires heavy application modification Performing queries across partitions is problematic (not possible) PL/Proxy can help 10/26/2009 14

Vertical DB Scaling “Buying a bigger box is quick(ish). Redesigning software is not.” ● Cal Henderson, Flickr 37 Signals Basecamp upgraded to 128 GB DB server: “don’t need to pay the complexity tax yet” ● David Heinemeier Hansson, Ruby on Rails 10/26/2009 15

Sites Running on Single DB StackOverflow – MS SQL, 48GB RAM, RAID 1 OS, RAID 10 for data 37Signals Basecamp – MySQL, 128GB RAM. Dell R710 or Dell 2950 10/26/2009 16

High Availability Overview 10/26/2009 17

High Availability Application still up even after node failure – (Also try to prevent failure with appropriate hardware) PostgreSQL High Availability Options – pg-pool – Shared Disk Failover – File System Replication – Warm Standby with Point-In-Time Recovery (PITR) Often still need heartbeat application 10/26/2009 18

Shared Disk Failover Use single disk array to hold database's data files. – Network Attached Storage (NAS) – Network File System (NFS) Disk array is central point of failure Need heartbeat to bring 2nd server online 10/26/2009 19

File System Replication File system is mirrored to another computer DRDB – Linux filesystem replication Need heartbeat to bring 2nd server online 10/26/2009 20

Point in Time Recovery “Log shipping” – Write Ahead Logs sent to and replayed on standby – Included in PostgreSQL 8.0+ – Asynchronous - Potential loss of data Warm Standby – Standbys' hardware very similar to primary's – Need heartbeat to bring 2nd server online 10/26/2009 21

Heartbeat “STONITH” (Shoot the Other Node In The Head) – Prevent multiple nodes thinking they are the master Linux-HA – Creates cluster, takes nodes out when they fail 10/26/2009 22

Additional Options 10/26/2009 23

Additional Options Tune PostgreSQL – Defaults designed to “run anywhere” – pgbench, VACUUM/ANALYZE Tune Queries – EXPLAIN Caching (avoid the database) – memcached – Ehcache 10/26/2009 24

Radical Additional Options “NoSQL database ” – CouchDB, MongoDB, HBase, Cassandra, Redis – Document store – Map/Reduce querying 10/26/2009 25

Suggested Architecture 10/26/2009 26

Current Production Setup DB and Web server on same machine No failover 10/26/2009 27

Suggested Architecture 2 nice machines Point in Time Recovery with Heartbeat Tune PostgreSQL Monitor & improve slow queries Add in Ehcache as we touch code → Leave horizontal scaling for another day 10/26/2009 28

Initial Architecture High Availability 10/26/2009 29

Future Architecture Scale up application servers horizontally as needed Improve DB Hardware 10/26/2009 30

Hardware Options PostgreSQL typically constrained by RAM & Disk IO, not processor 64-bit, as much memory as possible Data Array – RAID10 with 4 drives (not RAID 5), 15k RPM Separate OS Drive / Array 10/26/2009 31

Dell R710 Processor: Xeon 4x 15k HD in RAID10 24GB (3x 8GB) RAM (up to 6x 16GB) =$6,905 10/26/2009 32

Other Considerations Should have Test environment mimic Production – Same database setup – Provides environment for experimentation Can host multiple DBs on single cluster 10/26/2009 33

References http://37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding http://37signals.com/svn/posts/1819-basecamp-now-with-more-vroom http://anchor.com.au/hosting/dedicated/Tuning_PostgreSQL_on_your_Dedicated_S erver http://blogs.amd.co.at/robe/2009/05/testing-postgresql-replication-solutions-log- shipping-with-pg-standby.html http://blog.stackoverflow.com/2009/01/new-stack-overflow-servers-ready/ http://developer.postgresql.org/pgdocs/postgres/high-availability.html http://developer.postgresql.org/pgdocs/postgres/pgbench.html https://developer.skype.com/SkypeGarage/DbProjects/PlProxy http://wiki.postgresql.org/wiki/Performance_Optimization http://www.postgresql.org/docs/8.4/static/warm-standby.html http://www.postgresql.org/files/documentation/books/aw_pgsql/hw_performance/ http://www.slony.info/ 10/26/2009 34

Additional Links http://ehcache.org/ http://highscalability.com/skype-plans-postgresql-scale-1-billion- users http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalable DatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx http://www.danga.com/memcached/ http://www.mysqlperformanceblog.com/2009/08/06/why-you-dont- want-to-shard/ http://www.slideshare.net/iamcal/scalable-web-architectures- common-patterns-and-approaches-web-20-expo-nyc-presentation 10/26/2009 35

PostgreSQL Scaling And Failover

More Related Content

What's hot

Viewers also liked

Similar to PostgreSQL Scaling And Failover

More from John Paulett

Recently uploaded

In this document

PostgreSQL Scaling And Failover