Skip to content

Conversation

@waitingsong
Copy link
Contributor

@waitingsong waitingsong commented Mar 31, 2025

feat(node): zfs monitor and grafana ui

Installation

Add parameters:

  • zfs_exporter_enabled: setup zfs_exporter on this node, false by default
  • zfs_exporter_version: current 3.8.1
  • zfs_exporter_port: zfs exporter listen port, 9134 by default
  • zfs_exporter_options: see "/roles/node_monitor/defaults/main.yml"

Prometheus

Add node_exporter (forked from pdf-node_exporter) with more metrics:

  • node:ins:zfs_pool_metrics
  • node:ins:zfs_dataset_metrics

Add alert rules:

  • ZPoolDegraded
  • ZPoolFaulted
  • ZPoolOffline
  • ZPoolUnavail
  • ZPoolRemoved
  • ZPoolSuspended
  • ZPoolReadonly
  • ZPoolSpaceFull
  • ZDatasetSpaceFull

Update alert rule:

  • NodeFsSpaceFull with fstype!="zfs"

Grafana

Add panels:

  • node-zfs.json
    • list ZFS Pools and Datasets
    • ability search with Node ID, Pool, Dataset
    • alert summary about zfs

Update panels:

@waitingsong
Copy link
Contributor Author

node-overview.json

zfs-2025-03-31_141247

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

node-zfs.json

zfs-2025-04-01_145502

.

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

node-zfs.json

unhealthy pool list fisrt

zfs-2025-04-01_151730

node-overview.json
zfs-2025-03-31_140046

@waitingsong
Copy link
Contributor Author

node-alert.json

  • ZPoolDegraded
  • ZDatasetSpaceFull

zfs-2025-03-31_140112

@waitingsong
Copy link
Contributor Author

waitingsong commented Mar 31, 2025

ZPoolDegraded jump from alert page with pool name

zfs-2025-03-31_140942

@waitingsong
Copy link
Contributor Author

ZDatasetSpaceFull jump from alert page with pool name and dataset name

zfs-2025-03-31_140914

@waitingsong waitingsong force-pushed the zfs-monitor branch 11 times, most recently from cef6110 to 9fb1852 Compare April 1, 2025 08:04
@waitingsong
Copy link
Contributor Author

zfs_exporter aliveness panels

zfs-2025-04-01_165111

.

## Installation Add parameters: - `zfs_exporter_enabled`: setup zfs_exporter on this node, false by default - `zfs_exporter_version`: current `3.8.1` - `zfs_exporter_port`: zfs exporter listen port, 9134 by default - `zfs_exporter_options`: see "/roles/node_monitor/defaults/main.yml" ## Prometheus Add [node_exporter] (forked from [pdf-node_exporter]) with more metrics: - `node:zfs:pool_metrics` - `node:zfs:dataset_metrics` Add alert rules: - `ZPoolDegraded` - `ZPoolFaulted` - `ZPoolOffline` - `ZPoolUnavail` - `ZPoolRemoved` - `ZPoolSuspended` - `ZPoolReadonly` - `ZPoolSpaceFull` - `ZDatasetSpaceFull` Update alert rule: - `NodeFsSpaceFull` with `fstype!="zfs"` ## Grafana Add panels: - [node-zfs.json] - list ZFS Pools and Datasets - unhealthy pool list fisrt - ability search with `Node ID`, `Pool`, `Dataset` - alert summary about zfs Update panels: - [node-overview.json](http://g.pigsty/d/node-overview/node-overview) - add `ZFS Pools` Row - unhealthy pool list fisrt - [node-alert.json](http://g.pigsty/d/node-alert/node-alert) - show zfs alerts - cell link to [node-overview.json] page with pool name and/or dataset name filter [node_exporter]: https://github.com/waitingsong/zfs_exporter [pdf-node_exporter]: https://github.com/pdf/zfs_exporter [node-overview.json]: http://g.pigsty/d/node-overview/node-overview [node-zfs.json]: http://g.pigsty/d/zfs-overview/node-zfs
@Vonng
Copy link
Member

Vonng commented Apr 5, 2025

zfs_exporter is added to pigsty-infra repo now

pgsty/infra-pkg@906ab53

@Vonng Vonng force-pushed the main branch 4 times, most recently from df9afaf to 47d9e6d Compare April 5, 2025 14:29
@waitingsong waitingsong force-pushed the zfs-monitor branch 5 times, most recently from d94bca3 to 04f70eb Compare April 10, 2025 06:05
@waitingsong waitingsong changed the title feat(node): zfs monitor and grafana ui WIP: feat(node): zfs monitor and grafana ui Apr 10, 2025
- node:ins:zfs_arc_utilization - node:ins:zfs_arc_memory_ratio - node:ins:zfs_arc_meta_usage - node:ins:zfs_arc_hit_ratio - node:ins:zfs_arc_hit_ratio_rate1m - node:ins:zfs_arc_hit_ratio_rate5m - node:ins:zfs_arc_usage_ratio - node:ins:zfs_arc_pressure_ratio
@waitingsong waitingsong force-pushed the zfs-monitor branch 2 times, most recently from 662b303 to 38deca0 Compare April 11, 2025 06:17
- ARC Pressure - ARC Memory Ratio - ARC Utilization - ARC Meta Usage - ARC Dnode Size - ARC Evict Skip - ARC Hit (rate1m) - ARC Hit Ratio (rate1m)
@waitingsong
Copy link
Contributor Author

node-instance.json
link: /d/node-instance/node-instance

zfs-2025-04-11_144225

@waitingsong waitingsong changed the title WIP: feat(node): zfs monitor and grafana ui feat(node): zfs monitor and grafana ui Apr 11, 2025
@waitingsong
Copy link
Contributor Author

waitingsong commented Apr 12, 2025

node-alert.json
link: d/node-alert/node-alert

zfs-2025-04-12_215458

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants