Zabbix template & script to monitor ZFS on Linux.
Single Python script that emits all information needed for discovery & data gathering in a single JSON. All items are defined as Dependent and extract relevant data using JSONPath queries.
Click to expand JSON example
{ "vdevs": { "/dev/sda1": { "name": "/dev/sda1", "size": 0, "alloc": 0, "free": 0, "frag": 0, "usage": 0, "online": 1, "errors": { "read": 0, "write": 0, "cksum": 0 } } }, "pools": { "pool1": { "name": "pool1", "size": 11957188952064, "alloc": 4227267283968, "free": 7729921668096, "frag": 5, "usage": 35, "dedup": 1.0, "scrub": 0, "resilver": 0, "online": 1 } }, "datasets": { "pool1": { "name": "pool1", "avail": 5502826989670, "used": 3162194032538, "compress": 1.05, "referenced": 38300 } }, "arc": { "l1": { "size": 5398891600, "hitrate": 91.56803867935483, "free": 458843264 }, "l2": { "usage": 142278352384, "usage_actual": 137375770112, "hitrate": 94.00152633863829, "bytes_read": 241515501568, "bytes_written": 25932241920, "io_error": 0, "cksum_bad": 0 } }, "slab": 291337024 }- Low level discovery of:
- Pools
- Datasets
- Devices
- L2ARC - items will be created only if it's present
- Items:
- Pools: health, disk usage, fragmentation, deduplication, scrub, resilver
- Datasets: disk usage, compression ratio
- Devices: health, disk usage, error counts, fragmentation
- SLAB usage
- L1 & L2 ARC stats
- Triggers:
- Pool & device health
- Pool & dataset disk usage
- High fragmentation
- ARC usage
- L2ARC errors & low hit rate
- Scrub
- Resilver
- Zabbix agent passive checks. Can be converted to active if needed.
{$ZFS_DS_EXCLUDE}- Regexp to exclude the datasets during discovery{$ZFS_DS_INCLUDE}- Regexp to include the datasets during discovery{$ZFS_DS_USAGE_CRIT}- Dataset usage in % when crit alert is triggered{$ZFS_DS_USAGE_HIGH}- Dataset usage in % when high alert is triggered{$ZFS_POOL_EXCLUDE}- Regexp to exclude the pools during discovery{$ZFS_POOL_INCLUDE}- Regexp to include the pools during discovery{$ZFS_POOL_FRAG_HIGH}- Pool fragmentation in % when high alert is triggered{$ZFS_POOL_USAGE_CRIT}- Pool usage in % when crit alert is triggered{$ZFS_POOL_USAGE_HIGH}- Pool usage in % when high alert is triggered{$ZFS_VDEV_ERROR_THRESHOLD}- vdev and L2ARC error threshold when triggers fire{$ZFS_VDEV_EXCLUDE}- Regexp to exclude the vdevs during discovery{$ZFS_VDEV_INCLUDE}- Regexp to include the vdevs during discovery{$ZFS_ARC_META_THRESHOLD}- Alert when ARC meta usage % goes over this value{$ZFS_L2ARC_ERROR_THRESHOLD}- Error threshold for L2ARC errors{$ZFS_L2ARC_HITRATE_THRESHOLD}- Alert when L2ARC hit rate falls below this value
- Tested on OpenZFS 2.x, maybe will work with 0.8.x
- Tested on Zabbix 6.0, but should work on 4.2+
- Python3
- Place
zfs.confin/etc/zabbix/zabbix_agentd.d - Place
zfs.pyin/etc/zabbix/scriptsYou can put it into any other place, but then you'll have to adjustzfs.conf - Restart
zabbix-agentd - Import
template_zfs.xml - You're good to go