File tree Expand file tree Collapse file tree 4 files changed +25
-45
lines changed
prometheus/common/alert-rules.d Expand file tree Collapse file tree 4 files changed +25
-45
lines changed Original file line number Diff line number Diff line change 1+ ---
2+ minor_changes :
3+ - prometheus - Remove unnecessary absence alerts. The general ExporterDown metric can cover these scenarios
4+ - prometheus - Moved the ExporterDown alert to its own common alerts file and have it be enabled by default (no .example extension on the file name)
Original file line number Diff line number Diff line change 1+ # ##
2+ #
3+ # Copyright © 2017-2025 Crunchy Data Solutions, Inc. All Rights Reserved.
4+ #
5+ # ##
6+
7+ groups :
8+ - name : alert-rules
9+ rules :
10+
11+ # ######### COMMON RULES ##########
12+ - alert : ExporterDown
13+ expr : avg_over_time(up[5m]) < 0.5
14+ for : 10s
15+ labels :
16+ service : system
17+ severity : critical
18+ severity_num : 300
19+ annotations :
20+ description : ' Metrics exporter service for {{ $labels.job }} running on {{ $labels.instance }} has been down at least 50% of the time for the last 5 minutes. Service may be flapping or down.'
21+ summary : ' Prometheus Exporter Service Down'
Original file line number Diff line number Diff line change @@ -56,36 +56,3 @@ groups:
5656# severity_num: 300
5757# annotations:
5858# description: 'The expected minimum count of etcd nodes was not found. Current count {{ $value }}'
59-
60- # Absence alerts must be configured per named job, otherwise there's no way to know which job is down
61- # Below is are some examples using the leader metric for a targets called "etcd#" for a 3 node etcd cluster
62-
63- # - alert: ETCDAbsent_etcd1
64- # expr: absent(etcd_server_has_leader{job="ip11_etcd1"})
65- # for: 10s
66- # labels:
67- # service: etcd
68- # severity: critical
69- # severity_num: 300
70- # annotations:
71- # description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'
72-
73- # - alert: ETCDAbsent_etcd2
74- # expr: absent(etcd_server_has_leader{job="ip21_etcd2"})
75- # for: 10s
76- # labels:
77- # service: etcd
78- # severity: critical
79- # severity_num: 300
80- # annotations:
81- # description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'
82-
83- # - alert: ETCDAbsent_etcd3
84- # expr: absent(etcd_server_has_leader{job="ip31_etcd3"})
85- # for: 10s
86- # labels:
87- # service: etcd
88- # severity: critical
89- # severity_num: 300
90- # annotations:
91- # description: 'Leader metric is absent from target {{ $labels.job }}. Check that etcd is running on target host.'
Original file line number Diff line number Diff line change @@ -164,18 +164,6 @@ groups:
164164# summary: '{{ $labels.job }} has changed from replica to primary'
165165
166166
167- ## Absence alerts must be configured per named job, otherwise there's no way to know which job is down
168- ## Below is an example for a target job called "Prod"
169- # - alert: PGConnectionAbsent_Prod
170- # expr: absent(ccp_connection_stats_max_connections{job="Prod"})
171- # for: 10s
172- # labels:
173- # service: postgresql
174- # severity: critical
175- # severity_num: 300
176- # annotations:
177- # description: 'Connection metric is absent from target (Prod). Check that postgres_exporter can connect to PostgreSQL.'
178-
179167
180168## Optional monitor for changes to pg_settings (postgresql.conf) system catalog.
181169## A similar metric is available for monitoring pg_hba.conf. See ccp_hba_settings_checksum.
You can’t perform that action at this time.
0 commit comments