wal-g-exporter is a a Prometheus exporter for gathering WAL-G backup metrics for Postgres databases. It plays nice with the Zalando Spilo container especially and is configured by default to run within a Kubernetes environment as a sidecar container of Spilo (see Features and limitations).
As mentioned above, wal-g-exporter plays well along Spilo. This includes also Patroni cluster configurations with Spilo. So wal-g-exporter is aware when running on a primary instance or not. It will not collect and export metrics for standby / follower instances and will get aware of a role change while running. This ensures, that you can always collect metrics from all exporters in the Postgres cluster with no unwanted cumulation of metrics between the exporters.
For a list of all exported metrics and a description on then see the metrics overview.
wal-g-exporter relies on having all needed WAL-G environment variables within an envdir under /run/etc/wal-e.d/env. This is the default behavior of the Spilo image when you set the WAL-G environment variables (e.g. AWS_ACCESS_KEY_ID, AWS_ENDPOINT...). The entrypoint of the image
The following environment variables can be used to configure wal-g-exporter.
| Variable name | description | default value |
|---|---|---|
| HTTP_PORT | Port on which the http process of wal-g-exporter will run on to expose metrics | 9351 |
| PGHOST | Hostname or IP of the Postgres instance to monitor metrics on | localhost |
| PGPORT | Port of the Postgres instance to monitor metrics on | 5432 |
| PGUSER | Username with which to connect to the Postgres instance | postgres |
| PGDATABASE | Database name of the Postgres instance to connect to | postgres |
| PGPASSWORD | Password of the above configured user | |
| PGSSLMODE | SSL mode of the Postgres connection | require |
| WAL_G_SCRAPE_INTERVAL | Scrape interval of the exporter | 60 |
Here you find an example sidecar configuration for wal-g-exporter to run along within a Spilo pod. The most of the configuration is straightforward with one thing to mention. To make the envdir /run/etc/wal-e.d/env shared between the Spilo container and wal-g-exporter, you need to mount the volume walg (as here named in this example) also to the Spilo container. Spilo will take care of the content in this directory.
... - env: - name: POSTGRES_USER value: postgres - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: key: password name: postgres.postgres-backup-exporter.credentials.postgresql.acid.zalan.do - name: PGHOST value: 127.0.0.1 - name: PGPORT value: "5432" - name: PGPASSWORD valueFrom: secretKeyRef: key: password name: postgres.postgres-backup-exporter.credentials.postgresql.acid.zalan.do - name: PGUSER valueFrom: secretKeyRef: key: username name: postgres.postgres-backup-exporter.credentials.postgresql.acid.zalan.do image: ghcr.io/thedatabaseme/wal-g-prometheus-exporter:latest imagePullPolicy: IfNotPresent name: backup-exporter volumeMounts: - mountPath: /home/postgres/pgdata name: pgdata - mountPath: /run/etc name: walg ... volumes: - emptyDir: medium: Memory name: dshm - emptyDir: {} name: walg ...An example log output of wal-g-exporter looks like this:
2023-05-06 10:12:49,413 - Is NOT in recovery mode? True 2023-05-06 10:12:49,413 - Connected to primary database 2023-05-06 10:12:49,413 - Evaluating wal-g backups... 2023-05-06 10:12:49,413 - Updating basebackup metrics... 2023-05-06 10:12:49,439 - 4 basebackups found (first: 2023-05-06T09:57:42.19257Z, last: 2023-05-06T10:10:01.951179Z) 2023-05-06 10:12:49,440 - Last basebackup duration: 1.3355910778045654 2023-05-06 10:12:49,440 - Finished updating basebackup metrics... 2023-05-06 10:12:49,440 - Updating WAL archive metrics... 2023-05-06 10:12:49,478 - WAL integrity status is: OK 2023-05-06 10:12:49,478 - Found 7 WAL archives in 1 timelines, 0 WAL archives missing 2023-05-06 10:12:49,479 - Finished updating WAL archive metrics... 2023-05-06 10:12:49,479 - Updating S3 disk usage... 2023-05-06 10:12:49,508 - S3 diskusage in bytes: 31420173 2023-05-06 10:12:49,508 - Finished updating S3 metrics... 2023-05-06 10:12:49,508 - All metrics collected. Waiting for next update cycle... An example dashboard can be found under grafana/dashboard.json. Here is an example how it looks like: 
This project has its roots in the camptocamp/wal-g-prometheus-exporter project. I took both the idea and quite some amount of code from there. Since this project hasn't been contributed in quite some time now and I had a different strategy on how to write code, I decided to start over and take the code as a basis. So, many kudos to camptocamp.