Skip to content

Commit c11a5a6

Browse files
authored
feat: remove absence alerts. make common alert file (#475)
1 parent 3e87750 commit c11a5a6

File tree

4 files changed

+25
-45
lines changed

4 files changed

+25
-45
lines changed

changelogs/fragments/461.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
minor_changes:
3+
- prometheus - Remove unnecessary absence alerts. The general ExporterDown metric can cover these scenarios
4+
- prometheus - Moved the ExporterDown alert to its own common alerts file and have it be enabled by default (no .example extension on the file name)
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
###
2+
#
3+
# Copyright © 2017-2025 Crunchy Data Solutions, Inc. All Rights Reserved.
4+
#
5+
###
6+
7+
groups:
8+
- name: alert-rules
9+
rules:
10+
11+
########## COMMON RULES ##########
12+
- alert: ExporterDown
13+
expr: avg_over_time(up[5m]) < 0.5
14+
for: 10s
15+
labels:
16+
service: system
17+
severity: critical
18+
severity_num: 300
19+
annotations:
20+
description: 'Metrics exporter service for {{ $labels.job }} running on {{ $labels.instance }} has been down at least 50% of the time for the last 5 minutes. Service may be flapping or down.'
21+
summary: 'Prometheus Exporter Service Down'

prometheus/common/alert-rules.d/crunchy-alert-rules-etcd.yml.example

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -56,36 +56,3 @@ groups:
5656
# severity_num: 300
5757
# annotations:
5858
# description: 'The expected minimum count of etcd nodes was not found. Current count {{ $value }}'
59-
60-
# Absence alerts must be configured per named job, otherwise there's no way to know which job is down
61-
# Below is are some examples using the leader metric for a targets called "etcd#" for a 3 node etcd cluster
62-
63-
# - alert: ETCDAbsent_etcd1
64-
# expr: absent(etcd_server_has_leader{job="ip11_etcd1"})
65-
# for: 10s
66-
# labels:
67-
# service: etcd
68-
# severity: critical
69-
# severity_num: 300
70-
# annotations:
71-
# description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'
72-
73-
# - alert: ETCDAbsent_etcd2
74-
# expr: absent(etcd_server_has_leader{job="ip21_etcd2"})
75-
# for: 10s
76-
# labels:
77-
# service: etcd
78-
# severity: critical
79-
# severity_num: 300
80-
# annotations:
81-
# description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'
82-
83-
# - alert: ETCDAbsent_etcd3
84-
# expr: absent(etcd_server_has_leader{job="ip31_etcd3"})
85-
# for: 10s
86-
# labels:
87-
# service: etcd
88-
# severity: critical
89-
# severity_num: 300
90-
# annotations:
91-
# description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'

prometheus/common/alert-rules.d/crunchy-alert-rules-pg.yml.example

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -164,18 +164,6 @@ groups:
164164
# summary: '{{ $labels.job }} has changed from replica to primary'
165165

166166

167-
## Absence alerts must be configured per named job, otherwise there's no way to know which job is down
168-
## Below is an example for a target job called "Prod"
169-
# - alert: PGConnectionAbsent_Prod
170-
# expr: absent(ccp_connection_stats_max_connections{job="Prod"})
171-
# for: 10s
172-
# labels:
173-
# service: postgresql
174-
# severity: critical
175-
# severity_num: 300
176-
# annotations:
177-
# description: 'Connection metric is absent from target (Prod). Check that postgres_exporter can connect to PostgreSQL.'
178-
179167

180168
## Optional monitor for changes to pg_settings (postgresql.conf) system catalog.
181169
## A similar metric is available for monitoring pg_hba.conf. See ccp_hba_settings_checksum.

0 commit comments

Comments
 (0)