Priit Piipuu & Ilmar Kerm
A portfolio of brands within sportsbook, casino and games
4 • Started as a DBA twenty years ago, still OCP (11g) • Now gone over to the dark side, database developer in Kindred Group • Main job is to help developers use Oracle technologies in best way possible 4
5 • Senior database administrator in Kindred Group • Working in IT since 2000 • Working with Oracle database since 2003 • Working at Kindred since 2007/2015 • From LAMP developer to Oracle DBA • Oracle ACE Blog: https://ilmarkerm.eu Email: ilmar.kerm@kindredgroup.com Twitter: @ilmarkerm 5
• Threats against application high availability • Why this presentation, isn’t buying RAC enough? • What tools are available? • The acronyms: ‒ TAF: Transparent Application Failover ‒ FAN: Fast Application Notification ‒ ONS: Oracle Notification Service ‒ FCF + UCP: Fast Connection Failover and Universal Connection Pool ‒ AC: Application continuity • How it all fits together ‒ What application sees during failover event ‒ How things work in real world 6
• We always seem to think about unplanned database downtime ‒ Node crashes, hardware failure, network and storage failure ‒ Software crashes ‒ Software bugs ‒ User errors (Bobby Droptables) ‒ Cosmic scale catastrophe that brings down the data centre • But don’t forget about planned downtime ‒ Regular security patching, both OS and database! ‒ Database upgrades. It is 2018 and Oracle STILL CANNOT do it in rolling mode! ‒ Environment rebuilding. Years pass, we all recognise some mistakes („legacy“) that require full environment rebuilds ☺ • Which one of them is more frequent in your environment? 7
• DG: Data Guard ‒ Real time copy of your database ‒ Not writable, but can be opened for reading (Active DG - for extra €€€) ‒ Can be synchronous (or not) ‒ Role transition can be automated (Fast Start Failover) • GoldenGate ‒ Logical replication • RAC: Real Application Clusters ‒ Single database serviced read/write by multiple instances ‒ Strongly consistent ‒ Can span over short distances (max 100km) • GDS: Global Data Services ‒ Hiding configuration topology complexity from applications • Flashback ‒ Various different technologies designed to fix accidental user errors ‒ Not covered in this presentation 8
• We already purchased RAC, Oracle says we are now highly available. ‒ Not true, Oracle gives you many tools to achieve this, but you have to use them properly. • True high availability is not possible without client side help. ‒ Here we will see how the client side tools provided by Oracle work internally ‒ And how to avoid mistakes we made (and I think we have not been alone) 9
• Maintains a separate stand-alone real time copy of the database (standby) • Standby can be made primary (role transition) ‒ Can be automated using Fast Start Failover • Role transition is offline for clients ‒ Database operations are suspended during the procedure, a few minutes or more • 18c updates, writable + hiding role transition • Can be used to make database upgrades much faster using Logical Transient Standby ‒ Database operations are switched to standby, then primary database is upgraded, after that clients are switched back and standby can be upgraded • Long TNS connection strings support adding both primary and standby databases ‒ When primary is not open, clients will automatically connect to standby 10
• Single database is opened by multiple instances read write while maintaining strong consistency • Single connection from application still connects to a single instance ‒ If that instance fails (or shut down), client needs to reconnect ‒ Reconnection can be automated on client side using TAF or FCF • Adding and removing instances is disruptive, even during planned maintenance ‒ AKA “RAC brownout”, needed to reorganise database buffer cache ‒ Significantly improved in 12.2 using buddy instances ‒ Complete freeze when instance fails, min 30 seconds + reconfiguration time • Helps with many planned maintenance scenarios, when can be done rolling in mode ‒ Quarterly DB patching, OS patching, OS reconfiguration • Does not help for major database upgrades 11
• In case of failure in persistence layer, database clears up its mess. But! • In case of service/instance/node failure networking resources might not be gc’ed • Resolving networking timeouts takes time, up to 127s • Fast Connection Failover mechanism clears stale connections • Graceful failover in case of planned outages • Stale connections are retired gracefully • New connection attempts are sent to the surviving nodes • Extra features: connection load balancing and connection affinity 12
• Part of database connection string that directs client to a correct database ‒ Connecting using SID is deprecated since 9i (or 8i??). So do not do it!!! ‒ The only way to connect applications to pluggable databases • Can be used to distinguish different applications connecting to the same database ‒ Unique service for each application ‒ Per service performance metrics • Can be also used to enable default high availability features on clients ‒ Setting TAF attributes ‒ Setting FCF attributes ‒ Turning on Application Continuity 13
• Client side driver will detect database failure and reconnect silently • SELECT statements can be silently restarted • Transactions are lost • JDBC thin driver does not support TAF 14
‒ Mechanism that notifies other processes about service configuration and status changes ‒ Provides visibility into Oracle services, status messages can apply to services, instances or nodes ‒ FAN status message is published immediately when state change occurs ‒ Messages can be accessed either through • Oracle client (JDBC, UCP, OCI, ODP.NET), • FAN API • Server-side callouts ‒ Supported by RAC, DataGuard and GDS • Needs Clusterware? 15
• What FAN helps to avoid: ‒ Waiting on TCP/IP timeouts if node fails to close the sockets or during subsequent connections when IP address is down ‒ Attempts to connect service which is down ‒ Not connecting when services resume ‒ Processing the result in the client side when service goes down ‒ Attempts to execute work on suboptimal nodes 16
• Used to propagate notifications between Oracle Clusterware and FAN client (UCP) • With UCP 12.1.0.1 must be configured explicitly, starting with 12.1.0.2 it is configured automatically. ‒ In 12.1.0.1 and before: defaultDataSource.setONSConfiguration("nodes=node1:4200,node2:4200,node3:4200"); ‒ Can be massive configuration inflexibility/pain • In UCP 12.2 ONS, FAN and FCF (Fast Connection Failover) are enabled automatically when appropriate 17
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 18
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 19
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 20
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 21
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 22
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 23
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 24 • Cluster id • Cluster name • Instance id • Instance name
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.swe1. unibet.com;generatingNode:oc1n13.swe1.unibet.com;eventId:28468@oc1n14.swe1. unibet.com:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_ oc1n13.swe1.unibet.com_6200;clusterName:dbInstance_oc1n13.swe1.unibet.com_ 6200;instanceId:dbInstance_oc1n13.swe1.unibet.com_6200;instanceName:dbInstan ce_oc1n13.swe1.unibet.com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.pte.unibet.com, db_domain=pte.unibet.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 25 Event type can be one of • Service • Service member • Node • Instance • Database • Service metrics
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.swe1. unibet.com;generatingNode:oc1n13.swe1.unibet.com;eventId:28468@oc1n14.swe1. unibet.com:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_ oc1n13.swe1.unibet.com_6200;clusterName:dbInstance_oc1n13.swe1.unibet.com_ 6200;instanceId:dbInstance_oc1n13.swe1.unibet.com_6200;instanceName:dbInstan ce_oc1n13.swe1.unibet.com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.pte.unibet.com, db_domain=pte.unibet.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 26 Reason can be one of • USER • FAILURE • member_leave • public_nw_down • BOOT
• (event origin:00000000.00000000.00000000.0a45800e!6200;stamp:5b9b8241.000f6b2d;V ersion:1.0;eventType:database/event/service;affectedComponents:;affectedNodes:;g eneratingComponent:database/rac/service;generatingProcess:7239@oc1n13.exam ple.com;generatingNode:oc1n13.example.com;eventId:28468@oc1n14.example.co m:0:1010440:1541760309;creationTime:1541760309;clusterId:dbInstance_oc1n13.e xample.com_6200;clusterName:dbInstance_oc1n13.example.com_6200;instanceId: dbInstance_oc1n13.example.com_6200;instanceName:dbInstance_oc1n13.example .com_6200;LocalOnly:false;numberOfProperties:11;Content- Length:210;SubscriberID:1;{reason=USER, database=ptecdb2, instance=ptecdb24, event_type=SERVICEMEMBER, timezone=+01:00, service=fan_test.example.com, db_domain=pte.example.com, host=oc1n12, timestamp=2018-11-09 11:45:09, status=down}) 27 Status can be one of • up • down • nodedown • not_restarting
‒ Preconfigured FAN client-side integration, opaque for the application code ‒ FCF client can automatically subscribe and act on FAN events ‒ Uses FAN connection signature to match database sessions with events ‒ Acts based on event type, status and reason • status=down, reason=FAILURE – failure at database server, FCF aborts related connection immediately. Dead connections are removed from pooled clients • status=down, reason=PLANNED – FCF client drains the sessions ahead of planned maintenance • status=up – posted when service starts first time or resumes. FCF reallocates sessions so that load is balanced across the cluster • Load % -- enables runtime load balancing and QoS features 28
‒ Oracle is optimised to work with long running sessions • To put it mildly • Creating a new connection/session is expensive, should not be done for every incoming web request ‒ Workaround: create a pool of database connections on application server • Initially, set of available connections are created • Incoming web request borrows one of the available connections • Application returns connection to the pool when done • Sessions get reused a lot 29
‒ Problems for high availability: • When some connections are terminated, how should CP know about it? • Testing connections before lending it out could add significant overhead • Some connections could go to the overloaded instances ‒ Connections can be tested before lending them out • By sending commit or “select 1 from dual” and checking the result • If test fails, this single connection is removed from pool ‒ Maximum connection lifetime for planned maintenance • Stop services on an instance • Wait for a max connection life time to pass • Instance should be empty then of client connections 30
‒ Oracle provided JDBC connection pool implementing most of the High Availability features ‒ Universal: supposed to work with all behaving drivers ‒ Free to use ‒ Supports many cool features • Sharding! ‒ Acts as client for Fast Application Notification • Feature is called Fast Connection Failover (FCF) 31
32
33
34
35
• Unplanned outages ‒ Finds stale connection that do not have service available on any instance in cluster due to service/down or node/down events ‒ Operates both on borrowed and available connections, kills physical connections ‒ In case of borrowed connection, application receives exception ‒ Connections are not replaced in the pool, application is supposed to retry • Planned outages ‒ Stale borrowed connections are marked and will be aborted when returned to the pool ‒ Does not affect any ongoing transactions ‒ Main difference with unplanned outage is how borrowed connections are handled ‒ There is graceful connection draining starting with 12.1.0.2, • oracle.ucp.PlannedDrainingPeriod 36
• Good: ‒ Performance: no cluster waits • Bad: ‒ Service relocation creates DOWN events for SERVICE and SERVICEMEMBER ‒ UCP gets the message and does hard failover ‒ All the sessions are killed and everything is dead until corresponding UP messages 37
• Bad: ‒ Not-so-good for performance (cluster waits) • Good: ‒ Planned outages do not cause major interruption • Ugly: ‒ Is UCP receiving FAN messages? Is it taking action? ‒ Badly behaving applications 38
• Runtime connection rebalancing ‒ Routes connections to the instance that has best performance available ‒ Adjusts distribution of work based on different backend capabilities (CPU capacity or response time) ‒ Reacts quickly to cluster reconfiguration, overworked nodes or hangs ‒ Configured in server-side • Connection affinity ‒ Allows connection pool to select connections that are directed to specific instance. ‒ Transaction based affinity ‒ Web session affinity ‒ Needs callback to be created (callback as in Java class that implements ConnectionAffinityCallback interface) 39
• Only new connections are balanced, so set the timeouts ‒ InactiveConnectionTimeout: how long available connections can be idle before being removed from the connection pool ‒ MaxConnectionReuseTime: how long connection can be used before being gracefully removed from the pool • FAN events and actions are not logged by UCP, not even in debug level • How to monitor what’s going on? ‒ Does Simplefan work? ‒ FANwatcher for the rescue? 40
• New 12.1 feature • Goal is to hide even more instance failures from applications • JDBC driver records all transactions and can check commit outcome after node failure • In case transaction was not committed in case of instance failure, JDBC driver will silently reply them • When using UCP no code changes are needed, otherwise need to mark reply boundaries in code 41
• Database connection pool in Weblogic • Oracle GridLink has built in support for all database high availability features and requires no application side changes ‒ Including application continuity and connection affinity • It is extra cost option for Weblogic ‒ Not possible to use with other application servers 42
43

Database failover from client perspective