1

I've setup 2 identical linux servers with db2 10.5 . AUTO_BACKUP only works on one of them. Also, no health notifications come from that server at all, although postfix on localhost is running and a test message with sendmail is delivered.

The missing message I expect to see in db2diag :

Automatic job "Backup database online" has started on database STAGE_DB, alias STAGE_DB 

I have:

  • HEALTH_MON, AUTO_MAINT, AUTO_DB_BACKUP are all ON
  • backup criteria - do if older than 1 day
  • Online window is 24/7
  • db2dasrrm aka DAS is running

what else?

Window:

<?xml version="1.0" encoding="UTF-8"?> <DB2MaintenanceWindows xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config" > <!-- Online Maintenance Window --> <OnlineWindow Occurrence="During" startTime="00:00:00" duration="24" > <DaysOfWeek>All</DaysOfWeek> <DaysOfMonth>All</DaysOfMonth> <MonthsOfYear>All</MonthsOfYear> </OnlineWindow> </DB2MaintenanceWindows> 

backup:

<?xml version="1.0" encoding="UTF-8"?> <DB2AutoBackupPolicy xmlns="http://www.ibm.com/xmlns/prod/db2/autonomic/config" > <!-- Backup Options --> <BackupOptions mode="Online"> <BackupTarget> <DiskBackupTarget> <PathName>/media/okbackup/rsnapshot/hourly.0/ec-stage-db-1/db2autobak/STAGE_DB/</PathName> </DiskBackupTarget> </BackupTarget> </BackupOptions> <!-- Frequency of automatic backups --> <BackupCriteria numberOfFullBackups="1" timeSinceLastBackup="24" logSpaceConsumedSinceLastBackup="1000000"/> </DB2AutoBackupPolicy> 

admin.cfg:

 Admin Server Configuration Authentication Type DAS (AUTHENTICATION) = SERVER_ENCRYPT DAS Administration Authority Group Name (DASADM_GROUP) = dasadm1 DAS Discovery Mode (DISCOVER) = SEARCH Name of the DB2 Server System (DB2SYSTEM) = EC-STAGE-DB-1 Java Development Kit Installation Path DAS (JDK_PATH) = AUTOMATIC (/home/dasusr1/das/java/jdk) Java Development Kit Installation Path DAS (JDK_64_PATH) = AUTOMATIC (/home/dasusr1/das/java/jdk) DAS Code Page (DAS_CODEPAGE) = 0 DAS Territory (DAS_TERRITORY) = 0 Location of Contact List (CONTACT_HOST) = Execute Expired Tasks (EXEC_EXP_TASK) = NO Scheduler Mode (SCHED_ENABLE) = OFF SMTP Server (SMTP_SERVER) = localhost Tools Catalog Database (TOOLSCAT_DB) = Tools Catalog Database Instance (TOOLSCAT_INST) = Tools Catalog Database Schema (TOOLSCAT_SCHEMA) = Scheduler User ID = Diagnostic error capture level (DIAGLEVEL) = 3 

dbm.cfg:

 Database Manager Configuration Node type = Enterprise Server Edition with local and remote clients Database manager configuration release level = 0x1000 CPU speed (millisec/instruction) (CPUSPEED) = 2.597893e-07 Communications bandwidth (MB/sec) (COMM_BANDWIDTH) = 1.000000e+02 Max number of concurrently active databases (NUMDB) = 32 Federated Database System Support (FEDERATED) = NO Transaction processor monitor name (TP_MON_NAME) = Default charge-back account (DFT_ACCOUNT_STR) = Java Development Kit installation path (JDK_PATH) = /home/db2inst1/sqllib/java/jdk64 Diagnostic error capture level (DIAGLEVEL) = 3 Notify Level (NOTIFYLEVEL) = 3 Diagnostic data directory path (DIAGPATH) = /home/db2inst1/sqllib/db2dump/ Current member resolved DIAGPATH = /home/db2inst1/sqllib/db2dump/ Alternate diagnostic data directory path (ALT_DIAGPATH) = Current member resolved ALT_DIAGPATH = Size of rotating db2diag & notify logs (MB) (DIAGSIZE) = 100 Default database monitor switches Buffer pool (DFT_MON_BUFPOOL) = OFF Lock (DFT_MON_LOCK) = OFF Sort (DFT_MON_SORT) = OFF Statement (DFT_MON_STMT) = OFF Table (DFT_MON_TABLE) = OFF Timestamp (DFT_MON_TIMESTAMP) = ON Unit of work (DFT_MON_UOW) = OFF Monitor health of instance and databases (HEALTH_MON) = ON SYSADM group name (SYSADM_GROUP) = DB2IADM1 SYSCTRL group name (SYSCTRL_GROUP) = SYSMAINT group name (SYSMAINT_GROUP) = SYSMON group name (SYSMON_GROUP) = Client Userid-Password Plugin (CLNT_PW_PLUGIN) = Client Kerberos Plugin (CLNT_KRB_PLUGIN) = Group Plugin (GROUP_PLUGIN) = GSS Plugin for Local Authorization (LOCAL_GSSPLUGIN) = Server Plugin Mode (SRV_PLUGIN_MODE) = UNFENCED Server List of GSS Plugins (SRVCON_GSSPLUGIN_LIST) = Server Userid-Password Plugin (SRVCON_PW_PLUGIN) = Server Connection Authentication (SRVCON_AUTH) = NOT_SPECIFIED Cluster manager = Database manager authentication (AUTHENTICATION) = SERVER Alternate authentication (ALTERNATE_AUTH_ENC) = NOT_SPECIFIED Cataloging allowed without authority (CATALOG_NOAUTH) = NO Trust all clients (TRUST_ALLCLNTS) = YES Trusted client authentication (TRUST_CLNTAUTH) = CLIENT Bypass federated authentication (FED_NOAUTH) = NO Default database path (DFTDBPATH) = /home/db2inst1 Database monitor heap size (4KB) (MON_HEAP_SZ) = AUTOMATIC(90) Java Virtual Machine heap size (4KB) (JAVA_HEAP_SZ) = 2048 Audit buffer size (4KB) (AUDIT_BUF_SZ) = 0 Global instance memory (4KB) (INSTANCE_MEMORY) = AUTOMATIC(1696865) Member instance memory (4KB) = GLOBAL Agent stack size (AGENT_STACK_SZ) = 1024 Sort heap threshold (4KB) (SHEAPTHRES) = 0 Directory cache support (DIR_CACHE) = YES Application support layer heap size (4KB) (ASLHEAPSZ) = 15 Max requester I/O block size (bytes) (RQRIOBLK) = 65535 Workload impact by throttled utilities(UTIL_IMPACT_LIM) = 10 Priority of agents (AGENTPRI) = SYSTEM Agent pool size (NUM_POOLAGENTS) = AUTOMATIC(100) Initial number of agents in pool (NUM_INITAGENTS) = 0 Max number of coordinating agents (MAX_COORDAGENTS) = AUTOMATIC(200) Max number of client connections (MAX_CONNECTIONS) = AUTOMATIC(MAX_COORDAGENTS) Keep fenced process (KEEPFENCED) = YES Number of pooled fenced processes (FENCED_POOL) = AUTOMATIC(MAX_COORDAGENTS) Initial number of fenced processes (NUM_INITFENCED) = 0 Index re-creation time and redo index build (INDEXREC) = RESTART Transaction manager database name (TM_DATABASE) = 1ST_CONN Transaction resync interval (sec) (RESYNC_INTERVAL) = 180 SPM name (SPM_NAME) = ec_stage SPM log size (SPM_LOG_FILE_SZ) = 256 SPM resync agent limit (SPM_MAX_RESYNC) = 20 SPM log path (SPM_LOG_PATH) = TCP/IP Service name (SVCENAME) = 50000 Discovery mode (DISCOVER) = SEARCH Discover server instance (DISCOVER_INST) = ENABLE SSL server keydb file (SSL_SVR_KEYDB) = SSL server stash file (SSL_SVR_STASH) = SSL server certificate label (SSL_SVR_LABEL) = SSL service name (SSL_SVCENAME) = SSL cipher specs (SSL_CIPHERSPECS) = SSL versions (SSL_VERSIONS) = SSL client keydb file (SSL_CLNT_KEYDB) = SSL client stash file (SSL_CLNT_STASH) = Maximum query degree of parallelism (MAX_QUERYDEGREE) = ANY Enable intra-partition parallelism (INTRA_PARALLEL) = NO Maximum Asynchronous TQs per query (FEDERATED_ASYNC) = 0 No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = AUTOMATIC(4096) No. of int. communication channels (FCM_NUM_CHANNELS) = AUTOMATIC(2048) Inter-node comm. parallelism (FCM_PARALLELISM) = 1 Node connection elapse time (sec) (CONN_ELAPSE) = 10 Max number of node connection retries (MAX_CONNRETRIES) = 5 Max time difference between nodes (min) (MAX_TIME_DIFF) = 60 db2start/db2stop timeout (min) (START_STOP_TIME) = 10 WLM dispatcher enabled (WLM_DISPATCHER) = NO WLM dispatcher concurrency (WLM_DISP_CONCUR) = COMPUTED WLM dispatcher CPU shares enabled (WLM_DISP_CPU_SHARES) = NO WLM dispatcher min. utilization (%) (WLM_DISP_MIN_UTIL) = 5 Communication buffer exit library list (COMM_EXIT_LIST) = Current effective arch level (CUR_EFF_ARCH_LVL) = V:10 R:5 M:0 F:5 I:0 SB:0 Current effective code level (CUR_EFF_CODE_LVL) = V:10 R:5 M:0 F:5 I:0 SB:0 Keystore type (KEYSTORE_TYPE) = NONE Keystore location (KEYSTORE_LOCATION) = 

db.cfg:

 Database Configuration for Database STAGE_DB Database configuration release level = 0x1000 Database release level = 0x1000 Database territory = US Database code page = 1208 Database code set = UTF-8 Database country/region code = 1 Database collating sequence = IDENTITY Alternate collating sequence (ALT_COLLATE) = Number compatibility = OFF Varchar2 compatibility = OFF Date compatibility = OFF Database page size = 4096 Statement concentrator (STMT_CONC) = OFF Discovery support for this database (DISCOVER_DB) = ENABLE Restrict access = NO Default query optimization class (DFT_QUERYOPT) = 5 Degree of parallelism (DFT_DEGREE) = 1 Continue upon arithmetic exceptions (DFT_SQLMATHWARN) = NO Default refresh age (DFT_REFRESH_AGE) = 0 Default maintained table types for opt (DFT_MTTB_TYPES) = SYSTEM Number of frequent values retained (NUM_FREQVALUES) = 10 Number of quantiles retained (NUM_QUANTILES) = 20 Decimal floating point rounding mode (DECFLT_ROUNDING) = ROUND_HALF_EVEN Backup pending = NO All committed transactions have been written to disk = NO Rollforward pending = NO Restore pending = NO Multi-page file allocation enabled = YES Log retain for recovery status = RECOVERY User exit for logging status = NO Self tuning memory (SELF_TUNING_MEM) = ON Size of database shared memory (4KB) (DATABASE_MEMORY) = AUTOMATIC(1472088) Database memory threshold (DB_MEM_THRESH) = 10 Max storage for lock list (4KB) (LOCKLIST) = AUTOMATIC(273120) Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(98) Package cache size (4KB) (PCKCACHESZ) = AUTOMATIC(40438) Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = AUTOMATIC(17010) Sort list heap (4KB) (SORTHEAP) = AUTOMATIC(3402) Database heap (4KB) (DBHEAP) = AUTOMATIC(2579) Catalog cache size (4KB) (CATALOGCACHE_SZ) = 4096 Log buffer size (4KB) (LOGBUFSZ) = 98 Utilities heap size (4KB) (UTIL_HEAP_SZ) = 56710 SQL statement heap (4KB) (STMTHEAP) = AUTOMATIC(8192) Default application heap (4KB) (APPLHEAPSZ) = AUTOMATIC(256) Application Memory Size (4KB) (APPL_MEMORY) = AUTOMATIC(40016) Statistics heap size (4KB) (STAT_HEAP_SZ) = AUTOMATIC(4384) Interval for checking deadlock (ms) (DLCHKTIME) = 10000 Lock timeout (sec) (LOCKTIMEOUT) = 45 Changed pages threshold (CHNGPGS_THRESH) = 80 Number of asynchronous page cleaners (NUM_IOCLEANERS) = AUTOMATIC(2) Number of I/O servers (NUM_IOSERVERS) = AUTOMATIC(16) Sequential detect flag (SEQDETECT) = YES Default prefetch size (pages) (DFT_PREFETCH_SZ) = AUTOMATIC Track modified pages (TRACKMOD) = NO Default number of containers = 1 Default tablespace extentsize (pages) (DFT_EXTENT_SZ) = 32 Max number of active applications (MAXAPPLS) = AUTOMATIC(57) Average number of active applications (AVG_APPLS) = AUTOMATIC(1) Max DB files open per application (MAXFILOP) = 61440 Log file size (4KB) (LOGFILSIZ) = 128000 Number of primary log files (LOGPRIMARY) = 12 Number of secondary log files (LOGSECOND) = 10 Changed path to log files (NEWLOGPATH) = Path to log files = /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/ Overflow log path (OVERFLOWLOGPATH) = Mirror log path (MIRRORLOGPATH) = First active log file = S0000897.LOG Block log on disk full (BLK_LOG_DSK_FUL) = NO Block non logged operations (BLOCKNONLOGGED) = NO Percent max primary log space by transaction (MAX_LOG) = 0 Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0 Percent log file reclaimed before soft chckpt (SOFTMAX) = 520 Target for oldest page in LBP (PAGE_AGE_TRGT_MCR) = 240 HADR database role = STANDARD HADR local host name (HADR_LOCAL_HOST) = HADR local service name (HADR_LOCAL_SVC) = HADR remote host name (HADR_REMOTE_HOST) = HADR remote service name (HADR_REMOTE_SVC) = HADR instance name of remote server (HADR_REMOTE_INST) = HADR timeout value (HADR_TIMEOUT) = 120 HADR target list (HADR_TARGET_LIST) = HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = 0 HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0 HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 0 First log archive method (LOGARCHMETH1) = LOGRETAIN Archive compression for logarchmeth1 (LOGARCHCOMPR1) = OFF Options for logarchmeth1 (LOGARCHOPT1) = Second log archive method (LOGARCHMETH2) = OFF Archive compression for logarchmeth2 (LOGARCHCOMPR2) = OFF Options for logarchmeth2 (LOGARCHOPT2) = Failover log archive path (FAILARCHPATH) = Number of log archive retries on error (NUMARCHRETRY) = 5 Log archive retry Delay (secs) (ARCHRETRYDELAY) = 20 Vendor options (VENDOROPT) = Auto restart enabled (AUTORESTART) = ON Index re-creation time and redo index build (INDEXREC) = RESTART Log pages during index build (LOGINDEXBUILD) = OFF Default number of loadrec sessions (DFT_LOADREC_SES) = 1 Number of database backups to retain (NUM_DB_BACKUPS) = 12 Recovery history retention (days) (REC_HIS_RETENTN) = 366 Auto deletion of recovery objects (AUTO_DEL_REC_OBJ) = OFF TSM management class (TSM_MGMTCLASS) = TSM node name (TSM_NODENAME) = TSM owner (TSM_OWNER) = TSM password (TSM_PASSWORD) = Automatic maintenance (AUTO_MAINT) = ON Automatic database backup (AUTO_DB_BACKUP) = ON Automatic table maintenance (AUTO_TBL_MAINT) = ON Automatic runstats (AUTO_RUNSTATS) = ON Real-time statistics (AUTO_STMT_STATS) = OFF Statistical views (AUTO_STATS_VIEWS) = OFF Automatic sampling (AUTO_SAMPLING) = OFF Automatic reorganization (AUTO_REORG) = ON Auto-Revalidation (AUTO_REVAL) = DISABLED Currently Committed (CUR_COMMIT) = DISABLED CHAR output with DECIMAL input (DEC_TO_CHAR_FMT) = V95 Enable XML Character operations (ENABLE_XMLCHAR) = YES WLM Collection Interval (minutes) (WLM_COLLECT_INT) = 0 Monitor Collect Settings Request metrics (MON_REQ_METRICS) = NONE Activity metrics (MON_ACT_METRICS) = NONE Object metrics (MON_OBJ_METRICS) = NONE Routine data (MON_RTN_DATA) = NONE Routine executable list (MON_RTN_EXECLIST) = OFF Unit of work events (MON_UOW_DATA) = NONE UOW events with package list (MON_UOW_PKGLIST) = OFF UOW events with executable list (MON_UOW_EXECLIST) = OFF Lock timeout events (MON_LOCKTIMEOUT) = NONE Deadlock events (MON_DEADLOCK) = WITHOUT_HIST Lock wait events (MON_LOCKWAIT) = NONE Lock wait event threshold (MON_LW_THRESH) = 4294967295 Number of package list entries (MON_PKGLIST_SZ) = 32 Lock event notification level (MON_LCK_MSG_LVL) = 1 SMTP Server (SMTP_SERVER) = SQL conditional compilation flags (SQL_CCFLAGS) = Section actuals setting (SECTION_ACTUALS) = NONE Connect procedure (CONNECT_PROC) = Adjust temporal SYSTEM_TIME period (SYSTIME_PERIOD_ADJ) = NO Log DDL Statements (LOG_DDL_STMTS) = NO Log Application Information (LOG_APPL_INFO) = NO Default data capture on new Schemas (DFT_SCHEMAS_DCC) = NO Default table organization (DFT_TABLE_ORG) = ROW Default string units (STRING_UNITS) = SYSTEM National character string mapping (NCHAR_MAPPING) = GRAPHIC_CU16 Database is in write suspend state = NO Extended row size support (EXTENDED_ROW_SZ) = DISABLE Encryption Library for Backup (ENCRLIB) = Encryption Options for Backup (ENCROPTS) = Encrypted database = NO 

alert.cfg http://pastebin.com/AjPXD9mF

5
  • And both have DB2 installed as root? and appropriate permissions? And both have logging switched to something other than circular? (although personally I'd go with DISK over LOGRETAIN, allows you AUTO_DEL_REC_OBJ to automatically prune logs along with backups then). Commented May 26, 2015 at 12:47
  • @ChrisAldrich all yes. Commented May 26, 2015 at 13:39
  • I should at least receive "Backup in progress" notifications when I do a manual backup, right? That doesn't happen either Commented May 26, 2015 at 13:40
  • You need to use db2 list utilities [show detail] in order to see if there is a backup in progress or not. (The show detail is optional.) Commented May 26, 2015 at 16:29
  • Not sure what the issue is (your config all looks fine to me) but if you've got a working server my first suggestion would be to simply dump all the config on both machines to some text files and diff them to see if you've missed anything Commented May 26, 2015 at 16:57

3 Answers 3

2

Has your database been explicitly activated (with ACTIVATE DATABASE)? DB2 will not evaluate whether a database is a candidate for automatic backups if it is not active.

Relying on having at least 1 connection to the database to keep the database activated is a recipe for pain.

That said, I moved away from relying on automatic backups a long time ago, instead relying on the consistency and control you get when using a scheduler like cron.

1
  • ACTIVATE DATABASE Tried that, didn't help. I moved away from relying on automatic backups. Had to do the same. Commented Jun 18, 2015 at 11:36
1

In my case it was buggy semaphores in RHEL. Strangely, running gstack on frozen process helped to unfreeze it.

# workaround for http://www-01.ibm.com/support/docview.wss?uid=swg21694920 echo 'gstack `pgrep -f ^db2acd` >/dev/null 2>&1' >/etc/cron.hourly/db2acd-wdog.cron chmod 755 /etc/cron.hourly/db2acd-wdog.cron 
0

I was recently in the same situation, just solved now by replacing data-studio generated

BackupCriteria numberOfFullBackups="1" timeSinceLastBackup="24" logSpaceConsumedSinceLastBackup="6400" 

by

BackupCriteria numberOfFullBackups="1" timeSinceLastBackup="22" logSpaceConsumedSinceLastBackup="1" 

If you want to keep a small number of backups, do not forget to set rec_his_retentn to 0

Lastly, check authorizations.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.