1©2014 TransLattice, Inc. All Rights Reserved. 1 Supersized PostgreSQL: Postgres-XL for Scale-out OLTP and Big Data Analytics June 17, 2014 Mason Sharp
2©2014 TransLattice, Inc. All Rights Reserved. Agenda •  Postgres-XL Background •  Architecture Overview •  Distributing Tables •  Configuring a Local Cluster •  Backup and Recovery •  Differences to PostgreSQL and Postgres-XC •  Community
3©2014 TransLattice, Inc. All Rights Reserved. Announced May 2014
4©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL n  Open Source n  Scale-out RDBMS –  Massive Parallel Processing –  OLTP write and read scalability –  Multi-tenant security –  Cluster-wide ACID properties –  Can also act as distributed key-value store n  Mixed workloads, can eliminate need for separate clusters in some cases
5©2014 TransLattice, Inc. All Rights Reserved. XL Origins Considered open sourcing very early
6©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Applications and Use Case Examples Postgres-XL Application Use Case Example Business intelligence requiring MPP parallelism Financial services big data analysis of opportunities across service lines OLTP write-intensive workloads Online ad tracking Mixed workload environments Real-time reporting on a transactional system
8©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Technology •  Data is automatically spread across cluster •  Queries return back data from all nodes in parallel •  The Global Transaction Manager maintains a consistent view of the data across cluster •  Connect to any coordinator
9©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Features •  Multi Version Concurrency Control •  Referential Integrity* •  Full Text Search •  Geo-spatial Functions and Data Types •  JSON and XML Support •  Distributed Key-Value Store
10©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Missing Features •  Built in High Availability •  Use external like Corosync/Pacemaker •  Area for future improvement •  “Easy” to re-shard / add nodes •  Some pieces there •  Can however “pre-shard” multiple nodes on the same server or VM •  Some FK and UNIQUE constraints
11©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Connectors Looks like a PostgreSQL 9.2 database •  C •  C++ •  Perl •  Python •  PHP •  Erlang •  Haskell – Java – Javascript – .NET – Node.js – Ruby – Scala
13©2014 TransLattice, Inc. All Rights Reserved. Performance Transactional work loads 10 50 100 200 300 400 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Dell DVD Benchmark StormDB Amazon RDS
14©2014 TransLattice, Inc. All Rights Reserved. MPP Performance – DBT-1 (TPC-H) Postgres-XL PostgreSQL Postgres-XL
15©2014 TransLattice, Inc. All Rights Reserved. Key-value Store Comparison Postgres-XL MongoDB
17©2014 TransLattice, Inc. All Rights Reserved. Postgres-XL Architecture •  Based on the Postgres-XC project •  Coordinators •  Connection point for applications •  Parsing and planning of queries •  Data Nodes •  Actual data storage •  Local execution of queries •  Global Transaction Manager (GTM) •  Provides a consistent view to transactions •  GTM Proxy to increase performance
18©2014 TransLattice, Inc. All Rights Reserved. Coordinators •  Handles network connectivity to the client •  Parse and plan statements •  Perform final query processing of intermediate result sets •  Manages two-phase commit •  Stores global catalog information
19©2014 TransLattice, Inc. All Rights Reserved. Data Nodes •  Stores tables and indexes •  Only coordinators connects to data nodes •  Executes queries from the coordinators •  Communicates with peer data nodes for distributed joins
20©2014 TransLattice, Inc. All Rights Reserved. Global Transaction Manager (GTM) •  Handles necessary MVCC tasks •  Transaction IDs •  Snapshots •  Manages cluster wide values •  Timestamps •  Sequences •  GTM Standby can be configured
21©2014 TransLattice, Inc. All Rights Reserved. GTM Proxy •  Runs alongside Coordinator or Datanode •  Groups requests together •  Obtain range of transaction ids (XIDs) •  Obtain snapshot
22©2014 TransLattice, Inc. All Rights Reserved. Data Distribution •  Replicated Tables •  Each row is replicated to all data nodes •  Statement based replication •  Distributed Tables •  Each row is stored on one data node •  Available strategies •  Hash •  Round Robin •  Modulo
23©2014 TransLattice, Inc. All Rights Reserved. Availability •  No Single Point of Failure •  Global Transaction Manager Standby •  Coordinators are load balanced •  Streaming replication for data nodes •  But, currently manual…
24©2014 TransLattice, Inc. All Rights Reserved. DDL
25©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE my_table (…) DISTRIBUTE BY HASH(col) | MODULO(col) | ROUNDROBIN | REPLICATION [ TO NODE (nodename[,nodename…])]
26©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE state_code ( state_code CHAR(2), state_name VARCHAR, : ) DISTRIBUTE BY REPLICATION An exact replica on each node
27©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT, : ) DISTRIBUTE BY HASH(invoice_id) Distributed evenly amongst sharded nodes
28©2014 TransLattice, Inc. All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT …. ) DISTRIBUTE BY HASH(invoice_id); CREATE TABLE lineitem ( lineitem_id SERIAL, invoice_id INT … ) DISTRIBUTE BY HASH(invoice_id); Joins on invoice_id can be pushed down to the datanodes
29©2014 TransLattice, Inc. All Rights Reserved. test2=# create table t1 (col1 int, col2 int) distribute by hash(col1); test2=# create table t2 (col3 int, col4 int) distribute by hash(col3);
30©2014 TransLattice, Inc. All Rights Reserved. explain select * from t1 inner join t2 on col1 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Join push-down. Looks much like regular PostgreSQL
31©2014 TransLattice, Inc. All Rights Reserved. test2=# explain select * from t1 inner join t2 on col2 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Remote Subquery Scan on all (datanode_1,datanode_2) Distribute results by H: col2 -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Will read t1.col2 once and put in shared queue for consumption for other datanodes for joining. Datanode-datanode communication and parallelism
34©2014 TransLattice, Inc. All Rights Reserved.
35©2014 TransLattice, Inc. All Rights Reserved. Configuration
36©2014 TransLattice, Inc. All Rights Reserved.
37©2014 TransLattice, Inc. All Rights Reserved. Use pgxc_ctl! (demo in a few minutes)
38©2014 TransLattice, Inc. All Rights Reserved. Otherwise, PostgreSQL-like steps apply: initgtm initdb
39©2014 TransLattice, Inc. All Rights Reserved. postgresql.conf •  Very similar to regular PostgreSQL •  But there are extra parameters •  max_pool_size •  min_pool_size •  pool_maintenance _timeout •  remote_query_cost •  network_byte_cost •  sequence_range •  pooler_port •  gtm_host, gtm_port •  shared_queues •  shared_queue_size
40©2014 TransLattice, Inc. All Rights Reserved. Install n  Download RPMs –  http://sourceforge.net/projects/postgres-xl/ files/Releases/Version_9.2rc/ Or n  configure; make; make install
41©2014 TransLattice, Inc. All Rights Reserved. Setup Environment •  .bashrc •  ssh used, so ad installation bin dir to PATH
42©2014 TransLattice, Inc. All Rights Reserved. Avoid password with pgxc_ctl •  ssh-keygen –t rsa (in ~/.ssh) •  For local test, no need to copy key, already there •  cat ~/.ssh/id_rsa.pub >> ~/.ssh/ authorized_keys
43©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl •  Initialize cluster •  Start/stop •  Deploy to node •  Add coordinator •  Add data node / slave •  Add GTM slave
44©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf •  Let’s create a local test cluster! •  One GTM •  One Coordinator •  Two Datanodes Make sure all are using different ports •  including pooler_port
45©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf pgxcOwner=pgxl pgxcUser=$pgxcOwner pgxcInstallDir=/usr/postgres-xl-9.2
46©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf gtmMasterDir=/data/gtm_master gtmMasterServer=localhost gtmSlave=n gtmProxy=n
47©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf coordMasterDir=/data/coord_master coordNames=(coord1) coordPorts=(5432) poolerPorts=(20010) coordMasterServers=(localhost) coordMasterDirs=($coordMasterDir/coord1)
48©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf coordMaxWALSenders=($coordMaxWALsernder) coordSlave=n coordSpecificExtraConfig=(none) coordSpecificExtraPgHba=(none)
49©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf datanodeNames=(dn1 dn2) datanodeMasterDir=/data/dn_master datanodePorts=(5433 5434) datanodePoolerPorts=(20011 20012) datanodeMasterServers=(localhost localhost) datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
50©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl.conf datanodeMaxWALSenders=($datanodeMaxWalSe nder $datanodeMaxWalSender) datanodeSlave=n primaryDatanode=dn1
51©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl pgxc_ctl init all Now have a working cluster
52©2014 TransLattice, Inc. All Rights Reserved. pgxc_ctl – Add a second Coordinator Format: pgxc_ctl add coordinator master name host port pooler data_dir pgxc_ctl add coordinator master coord2 localhost 15432 20020 /data/coord_master/coord2
53©2014 TransLattice, Inc. All Rights Reserved. PostgreSQL Differences
54©2014 TransLattice, Inc. All Rights Reserved. pg_catalog •  pgxc_node •  Coordinator and Datanode definition •  Currently not GTM •  pgxc_class •  Table replication or distribution info
55©2014 TransLattice, Inc. All Rights Reserved. Additional Commands •  PAUSE CLUSTER / UNPAUSE CLUSTER •  Wait for current transactions to complete •  Prevent new commands until UNPAUSE •  EXECUTE DIRECT ON (nodename) ‘command’ •  CLEAN CONNECTION •  Cleans connection pool •  CREATE NODE / ALTER NODE
56©2014 TransLattice, Inc. All Rights Reserved. Noteworthy •  SELECT pgxc_pool_reload() •  pgxc_clean for 2PC •  Cleans prepared transactions •  If committed on one node, must be committed on all •  If aborted on one node, must be aborted on all •  If both committed and aborted output error
57©2014 TransLattice, Inc. All Rights Reserved. Backup and Recovery
58©2014 TransLattice, Inc. All Rights Reserved. Backup •  Like PostgreSQL •  pg_dump, pg_dumpall from any Coordinator •  Will pull all data from all nodes in one central location •  No notion yet of parallel pg_dump of individual nodes on node-by-node basis, nor parallel pg_restore
59©2014 TransLattice, Inc. All Rights Reserved. Cold Backup •  Stop Cluster •  Backup individual data directories on nodes’ file system •  Gzip, compress, rsync off, etc •  Cold restore do the opposite
60©2014 TransLattice, Inc. All Rights Reserved. Hot Backup •  Similar to PostgreSQL •  Base backup individual nodes •  use streaming replication or WAL archiving •  Backup GTM control file •  Contains XID •  Contains Sequence values •  Periodically written to •  On restart, it will skip ahead some values What about PITR?
61©2014 TransLattice, Inc. All Rights Reserved. Hot Backup - Barriers •  CREATE BARRIER ‘barrier_name’ •  Waits for currently committing transactions to finish •  Writes a barrier in each node’s WAL for a consistent restoration point •  Currently done manually! •  Could be configured as background process in the future to periodically write In recovery.conf, use recovery_target_barrier
62©2014 TransLattice, Inc. All Rights Reserved. Similarities and differences compared to Postgres-XC
63©2014 TransLattice, Inc. All Rights Reserved. Code Merging PostgreSQL Postgres-XC Postgres-XL Postgres-XC Enhancements Postgres-XL Enhancements Planner and Executor are different
64©2014 TransLattice, Inc. All Rights Reserved. Similarities with Postgres-XC
65©2014 TransLattice, Inc. All Rights Reserved. Some of the original Postgres-XC developers developed Postgres-XL
66©2014 TransLattice, Inc. All Rights Reserved. Global Transaction Manager
67©2014 TransLattice, Inc. All Rights Reserved. Pooler
68©2014 TransLattice, Inc. All Rights Reserved. pg_catalog
69©2014 TransLattice, Inc. All Rights Reserved. pgxc_* pgxc_ctl, pgxc_node
70©2014 TransLattice, Inc. All Rights Reserved. “pgxc” = Postgres-XL Cluster J
71©2014 TransLattice, Inc. All Rights Reserved. Differences compared to Postgres-XC
72©2014 TransLattice, Inc. All Rights Reserved. n  Performance n  Multi-tenant security n  Statistics
73©2014 TransLattice, Inc. All Rights Reserved. Performance Data node ßà Data node Communication –  Avoid Coordinator-centric processing –  Can be many times faster for some queries Pooler needs to be configured for each Data node, not just coordinator –  More connections needed at each node
74©2014 TransLattice, Inc. All Rights Reserved. Performance Re-parse avoidance In XC, SQL strings are sent to other nodes from Coordinator, where they are reparsed and replanned. In XL, plan once on coordinator, serialize and send over
75©2014 TransLattice, Inc. All Rights Reserved. Performance Sequences Detect frequently accessed sequences, fetch range of values, avoid fewer round trips to GTM COPY INSERT SELECT
76©2014 TransLattice, Inc. All Rights Reserved. Multi-tenant Security User can only get own info from global pg_catalog pg_database, pg_user Harder to hack if names of other database objects are unknown
77©2014 TransLattice, Inc. All Rights Reserved. Statistics storm_stats Basic DML stats by time range pg_stat_statements may be sufficient
78©2014 TransLattice, Inc. All Rights Reserved. Differences Planner and Executor are different
79©2014 TransLattice, Inc. All Rights Reserved. Differences GUC n  remote_query_cost n  network_byte_cost n  pool_maintenance_timeout n  sequence_range n  shared_queues n  shared_queue_size
80©2014 TransLattice, Inc. All Rights Reserved. Looking at Postgres-XL Code n  XC #ifdef PGXC n  XL #ifdef XCP #ifndef XCP
81©2014 TransLattice, Inc. All Rights Reserved. Community
82©2014 TransLattice, Inc. All Rights Reserved. Website and Code n  postgres-xl.org n  sourceforge.net/projects/postgres-xl n  Will add mirror for github.com n  Docs –  files.postgres-xl.org/documentation/index.html
84©2014 TransLattice, Inc. All Rights Reserved. Philosophy n  Stability and bug fixes over features n  Performance-focused n  Open and inclusive –  Welcome input for roadmap, priorities and design –  Welcome others to become core members and contributors
85©2014 TransLattice, Inc. All Rights Reserved. Planned Community Structure n  Like PostgreSQL –  Core –  Major Contributors –  Minor Contributors –  Advocacy
87©2014 TransLattice, Inc. All Rights Reserved. Roadmap n  Catch up to PostgreSQL Community -> 9.3, 9.4 n  Make XL a more robust analytical platform –  Distributed Foreign Data Wrappers n  Remove blockers to adoption n  Native High Availability
88©2014 TransLattice, Inc. All Rights Reserved. Thank You! msharp@translattice.com @mason_db

Supersized PostgreSQL: Postgres-XL for Scale-Out OLTP and Big Data Analytics

  • 1.
    1©2014 TransLattice, Inc.All Rights Reserved. 1 Supersized PostgreSQL: Postgres-XL for Scale-out OLTP and Big Data Analytics June 17, 2014 Mason Sharp
  • 2.
    2©2014 TransLattice, Inc.All Rights Reserved. Agenda •  Postgres-XL Background •  Architecture Overview •  Distributing Tables •  Configuring a Local Cluster •  Backup and Recovery •  Differences to PostgreSQL and Postgres-XC •  Community
  • 3.
    3©2014 TransLattice, Inc.All Rights Reserved. Announced May 2014
  • 4.
    4©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL n  Open Source n  Scale-out RDBMS –  Massive Parallel Processing –  OLTP write and read scalability –  Multi-tenant security –  Cluster-wide ACID properties –  Can also act as distributed key-value store n  Mixed workloads, can eliminate need for separate clusters in some cases
  • 5.
    5©2014 TransLattice, Inc.All Rights Reserved. XL Origins Considered open sourcing very early
  • 6.
    6©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Applications and Use Case Examples Postgres-XL Application Use Case Example Business intelligence requiring MPP parallelism Financial services big data analysis of opportunities across service lines OLTP write-intensive workloads Online ad tracking Mixed workload environments Real-time reporting on a transactional system
  • 7.
    8©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Technology •  Data is automatically spread across cluster •  Queries return back data from all nodes in parallel •  The Global Transaction Manager maintains a consistent view of the data across cluster •  Connect to any coordinator
  • 8.
    9©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Features •  Multi Version Concurrency Control •  Referential Integrity* •  Full Text Search •  Geo-spatial Functions and Data Types •  JSON and XML Support •  Distributed Key-Value Store
  • 9.
    10©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Missing Features •  Built in High Availability •  Use external like Corosync/Pacemaker •  Area for future improvement •  “Easy” to re-shard / add nodes •  Some pieces there •  Can however “pre-shard” multiple nodes on the same server or VM •  Some FK and UNIQUE constraints
  • 10.
    11©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Connectors Looks like a PostgreSQL 9.2 database •  C •  C++ •  Perl •  Python •  PHP •  Erlang •  Haskell – Java – Javascript – .NET – Node.js – Ruby – Scala
  • 11.
    13©2014 TransLattice, Inc.All Rights Reserved. Performance Transactional work loads 10 50 100 200 300 400 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 Dell DVD Benchmark StormDB Amazon RDS
  • 12.
    14©2014 TransLattice, Inc.All Rights Reserved. MPP Performance – DBT-1 (TPC-H) Postgres-XL PostgreSQL Postgres-XL
  • 13.
    15©2014 TransLattice, Inc.All Rights Reserved. Key-value Store Comparison Postgres-XL MongoDB
  • 14.
    17©2014 TransLattice, Inc.All Rights Reserved. Postgres-XL Architecture •  Based on the Postgres-XC project •  Coordinators •  Connection point for applications •  Parsing and planning of queries •  Data Nodes •  Actual data storage •  Local execution of queries •  Global Transaction Manager (GTM) •  Provides a consistent view to transactions •  GTM Proxy to increase performance
  • 15.
    18©2014 TransLattice, Inc.All Rights Reserved. Coordinators •  Handles network connectivity to the client •  Parse and plan statements •  Perform final query processing of intermediate result sets •  Manages two-phase commit •  Stores global catalog information
  • 16.
    19©2014 TransLattice, Inc.All Rights Reserved. Data Nodes •  Stores tables and indexes •  Only coordinators connects to data nodes •  Executes queries from the coordinators •  Communicates with peer data nodes for distributed joins
  • 17.
    20©2014 TransLattice, Inc.All Rights Reserved. Global Transaction Manager (GTM) •  Handles necessary MVCC tasks •  Transaction IDs •  Snapshots •  Manages cluster wide values •  Timestamps •  Sequences •  GTM Standby can be configured
  • 18.
    21©2014 TransLattice, Inc.All Rights Reserved. GTM Proxy •  Runs alongside Coordinator or Datanode •  Groups requests together •  Obtain range of transaction ids (XIDs) •  Obtain snapshot
  • 19.
    22©2014 TransLattice, Inc.All Rights Reserved. Data Distribution •  Replicated Tables •  Each row is replicated to all data nodes •  Statement based replication •  Distributed Tables •  Each row is stored on one data node •  Available strategies •  Hash •  Round Robin •  Modulo
  • 20.
    23©2014 TransLattice, Inc.All Rights Reserved. Availability •  No Single Point of Failure •  Global Transaction Manager Standby •  Coordinators are load balanced •  Streaming replication for data nodes •  But, currently manual…
  • 21.
    24©2014 TransLattice, Inc.All Rights Reserved. DDL
  • 22.
    25©2014 TransLattice, Inc.All Rights Reserved. Defining Tables CREATE TABLE my_table (…) DISTRIBUTE BY HASH(col) | MODULO(col) | ROUNDROBIN | REPLICATION [ TO NODE (nodename[,nodename…])]
  • 23.
    26©2014 TransLattice, Inc.All Rights Reserved. Defining Tables CREATE TABLE state_code ( state_code CHAR(2), state_name VARCHAR, : ) DISTRIBUTE BY REPLICATION An exact replica on each node
  • 24.
    27©2014 TransLattice, Inc.All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT, : ) DISTRIBUTE BY HASH(invoice_id) Distributed evenly amongst sharded nodes
  • 25.
    28©2014 TransLattice, Inc.All Rights Reserved. Defining Tables CREATE TABLE invoice ( invoice_id SERIAL, cust_id INT …. ) DISTRIBUTE BY HASH(invoice_id); CREATE TABLE lineitem ( lineitem_id SERIAL, invoice_id INT … ) DISTRIBUTE BY HASH(invoice_id); Joins on invoice_id can be pushed down to the datanodes
  • 26.
    29©2014 TransLattice, Inc.All Rights Reserved. test2=# create table t1 (col1 int, col2 int) distribute by hash(col1); test2=# create table t2 (col3 int, col4 int) distribute by hash(col3);
  • 27.
    30©2014 TransLattice, Inc.All Rights Reserved. explain select * from t1 inner join t2 on col1 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Join push-down. Looks much like regular PostgreSQL
  • 28.
    31©2014 TransLattice, Inc.All Rights Reserved. test2=# explain select * from t1 inner join t2 on col2 = col3; Remote Subquery Scan on all (datanode_1,datanode_2) -> Hash Join Hash Cond: -> Remote Subquery Scan on all (datanode_1,datanode_2) Distribute results by H: col2 -> Seq Scan on t1 -> Hash -> Seq Scan on t2 Will read t1.col2 once and put in shared queue for consumption for other datanodes for joining. Datanode-datanode communication and parallelism
  • 29.
    34©2014 TransLattice, Inc.All Rights Reserved.
  • 30.
    35©2014 TransLattice, Inc.All Rights Reserved. Configuration
  • 31.
    36©2014 TransLattice, Inc.All Rights Reserved.
  • 32.
    37©2014 TransLattice, Inc.All Rights Reserved. Use pgxc_ctl! (demo in a few minutes)
  • 33.
    38©2014 TransLattice, Inc.All Rights Reserved. Otherwise, PostgreSQL-like steps apply: initgtm initdb
  • 34.
    39©2014 TransLattice, Inc.All Rights Reserved. postgresql.conf •  Very similar to regular PostgreSQL •  But there are extra parameters •  max_pool_size •  min_pool_size •  pool_maintenance _timeout •  remote_query_cost •  network_byte_cost •  sequence_range •  pooler_port •  gtm_host, gtm_port •  shared_queues •  shared_queue_size
  • 35.
    40©2014 TransLattice, Inc.All Rights Reserved. Install n  Download RPMs –  http://sourceforge.net/projects/postgres-xl/ files/Releases/Version_9.2rc/ Or n  configure; make; make install
  • 36.
    41©2014 TransLattice, Inc.All Rights Reserved. Setup Environment •  .bashrc •  ssh used, so ad installation bin dir to PATH
  • 37.
    42©2014 TransLattice, Inc.All Rights Reserved. Avoid password with pgxc_ctl •  ssh-keygen –t rsa (in ~/.ssh) •  For local test, no need to copy key, already there •  cat ~/.ssh/id_rsa.pub >> ~/.ssh/ authorized_keys
  • 38.
    43©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl •  Initialize cluster •  Start/stop •  Deploy to node •  Add coordinator •  Add data node / slave •  Add GTM slave
  • 39.
    44©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf •  Let’s create a local test cluster! •  One GTM •  One Coordinator •  Two Datanodes Make sure all are using different ports •  including pooler_port
  • 40.
    45©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf pgxcOwner=pgxl pgxcUser=$pgxcOwner pgxcInstallDir=/usr/postgres-xl-9.2
  • 41.
    46©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf gtmMasterDir=/data/gtm_master gtmMasterServer=localhost gtmSlave=n gtmProxy=n
  • 42.
    47©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf coordMasterDir=/data/coord_master coordNames=(coord1) coordPorts=(5432) poolerPorts=(20010) coordMasterServers=(localhost) coordMasterDirs=($coordMasterDir/coord1)
  • 43.
    48©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf coordMaxWALSenders=($coordMaxWALsernder) coordSlave=n coordSpecificExtraConfig=(none) coordSpecificExtraPgHba=(none)
  • 44.
    49©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf datanodeNames=(dn1 dn2) datanodeMasterDir=/data/dn_master datanodePorts=(5433 5434) datanodePoolerPorts=(20011 20012) datanodeMasterServers=(localhost localhost) datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
  • 45.
    50©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl.conf datanodeMaxWALSenders=($datanodeMaxWalSe nder $datanodeMaxWalSender) datanodeSlave=n primaryDatanode=dn1
  • 46.
    51©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl pgxc_ctl init all Now have a working cluster
  • 47.
    52©2014 TransLattice, Inc.All Rights Reserved. pgxc_ctl – Add a second Coordinator Format: pgxc_ctl add coordinator master name host port pooler data_dir pgxc_ctl add coordinator master coord2 localhost 15432 20020 /data/coord_master/coord2
  • 48.
    53©2014 TransLattice, Inc.All Rights Reserved. PostgreSQL Differences
  • 49.
    54©2014 TransLattice, Inc.All Rights Reserved. pg_catalog •  pgxc_node •  Coordinator and Datanode definition •  Currently not GTM •  pgxc_class •  Table replication or distribution info
  • 50.
    55©2014 TransLattice, Inc.All Rights Reserved. Additional Commands •  PAUSE CLUSTER / UNPAUSE CLUSTER •  Wait for current transactions to complete •  Prevent new commands until UNPAUSE •  EXECUTE DIRECT ON (nodename) ‘command’ •  CLEAN CONNECTION •  Cleans connection pool •  CREATE NODE / ALTER NODE
  • 51.
    56©2014 TransLattice, Inc.All Rights Reserved. Noteworthy •  SELECT pgxc_pool_reload() •  pgxc_clean for 2PC •  Cleans prepared transactions •  If committed on one node, must be committed on all •  If aborted on one node, must be aborted on all •  If both committed and aborted output error
  • 52.
    57©2014 TransLattice, Inc.All Rights Reserved. Backup and Recovery
  • 53.
    58©2014 TransLattice, Inc.All Rights Reserved. Backup •  Like PostgreSQL •  pg_dump, pg_dumpall from any Coordinator •  Will pull all data from all nodes in one central location •  No notion yet of parallel pg_dump of individual nodes on node-by-node basis, nor parallel pg_restore
  • 54.
    59©2014 TransLattice, Inc.All Rights Reserved. Cold Backup •  Stop Cluster •  Backup individual data directories on nodes’ file system •  Gzip, compress, rsync off, etc •  Cold restore do the opposite
  • 55.
    60©2014 TransLattice, Inc.All Rights Reserved. Hot Backup •  Similar to PostgreSQL •  Base backup individual nodes •  use streaming replication or WAL archiving •  Backup GTM control file •  Contains XID •  Contains Sequence values •  Periodically written to •  On restart, it will skip ahead some values What about PITR?
  • 56.
    61©2014 TransLattice, Inc.All Rights Reserved. Hot Backup - Barriers •  CREATE BARRIER ‘barrier_name’ •  Waits for currently committing transactions to finish •  Writes a barrier in each node’s WAL for a consistent restoration point •  Currently done manually! •  Could be configured as background process in the future to periodically write In recovery.conf, use recovery_target_barrier
  • 57.
    62©2014 TransLattice, Inc.All Rights Reserved. Similarities and differences compared to Postgres-XC
  • 58.
    63©2014 TransLattice, Inc.All Rights Reserved. Code Merging PostgreSQL Postgres-XC Postgres-XL Postgres-XC Enhancements Postgres-XL Enhancements Planner and Executor are different
  • 59.
    64©2014 TransLattice, Inc.All Rights Reserved. Similarities with Postgres-XC
  • 60.
    65©2014 TransLattice, Inc.All Rights Reserved. Some of the original Postgres-XC developers developed Postgres-XL
  • 61.
    66©2014 TransLattice, Inc.All Rights Reserved. Global Transaction Manager
  • 62.
    67©2014 TransLattice, Inc.All Rights Reserved. Pooler
  • 63.
    68©2014 TransLattice, Inc.All Rights Reserved. pg_catalog
  • 64.
    69©2014 TransLattice, Inc.All Rights Reserved. pgxc_* pgxc_ctl, pgxc_node
  • 65.
    70©2014 TransLattice, Inc.All Rights Reserved. “pgxc” = Postgres-XL Cluster J
  • 66.
    71©2014 TransLattice, Inc.All Rights Reserved. Differences compared to Postgres-XC
  • 67.
    72©2014 TransLattice, Inc.All Rights Reserved. n  Performance n  Multi-tenant security n  Statistics
  • 68.
    73©2014 TransLattice, Inc.All Rights Reserved. Performance Data node ßà Data node Communication –  Avoid Coordinator-centric processing –  Can be many times faster for some queries Pooler needs to be configured for each Data node, not just coordinator –  More connections needed at each node
  • 69.
    74©2014 TransLattice, Inc.All Rights Reserved. Performance Re-parse avoidance In XC, SQL strings are sent to other nodes from Coordinator, where they are reparsed and replanned. In XL, plan once on coordinator, serialize and send over
  • 70.
    75©2014 TransLattice, Inc.All Rights Reserved. Performance Sequences Detect frequently accessed sequences, fetch range of values, avoid fewer round trips to GTM COPY INSERT SELECT
  • 71.
    76©2014 TransLattice, Inc.All Rights Reserved. Multi-tenant Security User can only get own info from global pg_catalog pg_database, pg_user Harder to hack if names of other database objects are unknown
  • 72.
    77©2014 TransLattice, Inc.All Rights Reserved. Statistics storm_stats Basic DML stats by time range pg_stat_statements may be sufficient
  • 73.
    78©2014 TransLattice, Inc.All Rights Reserved. Differences Planner and Executor are different
  • 74.
    79©2014 TransLattice, Inc.All Rights Reserved. Differences GUC n  remote_query_cost n  network_byte_cost n  pool_maintenance_timeout n  sequence_range n  shared_queues n  shared_queue_size
  • 75.
    80©2014 TransLattice, Inc.All Rights Reserved. Looking at Postgres-XL Code n  XC #ifdef PGXC n  XL #ifdef XCP #ifndef XCP
  • 76.
    81©2014 TransLattice, Inc.All Rights Reserved. Community
  • 77.
    82©2014 TransLattice, Inc.All Rights Reserved. Website and Code n  postgres-xl.org n  sourceforge.net/projects/postgres-xl n  Will add mirror for github.com n  Docs –  files.postgres-xl.org/documentation/index.html
  • 78.
    84©2014 TransLattice, Inc.All Rights Reserved. Philosophy n  Stability and bug fixes over features n  Performance-focused n  Open and inclusive –  Welcome input for roadmap, priorities and design –  Welcome others to become core members and contributors
  • 79.
    85©2014 TransLattice, Inc.All Rights Reserved. Planned Community Structure n  Like PostgreSQL –  Core –  Major Contributors –  Minor Contributors –  Advocacy
  • 80.
    87©2014 TransLattice, Inc.All Rights Reserved. Roadmap n  Catch up to PostgreSQL Community -> 9.3, 9.4 n  Make XL a more robust analytical platform –  Distributed Foreign Data Wrappers n  Remove blockers to adoption n  Native High Availability
  • 81.
    88©2014 TransLattice, Inc.All Rights Reserved. Thank You! msharp@translattice.com @mason_db