Skip to main content
deleted 45 characters in body
Source Link
Jim L.
  • 8.8k
  • 1
  • 15
  • 29

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. Even if you report the filesystems "hard" numbers from used and available

That That is what I do in my uptime uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the hardactual numbers of bitsbytes that are used (allocated) or not not used (size-allocated) on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. Even if you report the filesystems "hard" numbers from used and available

That is what I do in my uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the hard numbers of bits that are used or not used on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. That is what I do in my uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the actual numbers of bytes that are used (allocated) or not used (size-allocated) on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

deleted 17 characters in body
Source Link
Jim L.
  • 8.8k
  • 1
  • 15
  • 29

I am not qualified to say for sure, but the root problem of Samba not being able to determine sane numbers may lie in a defect somewhere in the upstream integration of ZFS and Samba. At the very least, Samba ought to be able to find the ZFS filesystem's used and avail properties, as you are trying to do by hand. All my systems figure this out just fine on their own.

But at a more fundamental level, your post suggests that you realize, though your users may not, that ZFS is something of a paradigm shift in disk storage. One telling hint is that you're having to calculate the "Total" capacity of the filesystem. ZFS itself doesn't have this concept (at the filesystem level), and indeed, there is no size property that can be accessed by the zfs list utility.

Another way to put it: take a generic 4TB drive. What is its "capacity"? The subset of your users who call you asking about disk space will probably say "four terabytes." But in ZFS, that's the physical capacity, the amount of hard disk space, a minimum number, and is true only in the case where the data is incompressible. If your data is compressible, then the total logical capacity of the drive (which may be what your users think of when they ask "How much more storage do we have?") is likely higher than 4 TB. Even using the old lz4 compression algorithm, I have filesystems that exceed 3x compression, one reaching almost 6x. At that rate the "4 TB" drive's capacity would be between 12 and 24 terabytes. With zstd compression, the compression ratios can go to double digits.

So inThe difficulty is that, while ZFS knows what the compression ratio of the stored data is (used vs. logicalused), there is no way to determine how compressible the "available" space is until data is actually written to it. In other words, total=used+avail breaks because although used is known, avail is only a minimum. total could actually be more than used+avail, depending on what your future data storage patterns are.

In ZFS filesystems, there is no such thing as "disk capacity" except in the context of the actual data that is going to be stored. The disk capacity literally changes as the data is written to the disk, because while the filesystem's When a filesystem with a known usedavailable number increases byvalue stores additional data, say a file 10G in size, the avail may drop byit could compress at 2:1 to consume only 5G, if of the data is 2xactual disk. So saving a compressible 10G file could cause the logical capacity to go up by 5GB. Storing that 10G of data caused your "capacity" apparent "capacity" to go up by 5 GB. This is why your users are confused. They don't understand that a drive with ana nominal physical capacity of 4TB could increase to a logical capacity of 8TB of actual user data, if the user data stored on it is 2:1 compressible. Tracking the amount of available or total space in a compressible filesystem is always going to be a shifting target.

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. That Even if you report the filesystems "hard" numbers from used and available

That is what I do in my uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the hard numbers of bits that are used or not used on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

But at a more fundamental level, your post suggests that you realize, though your users may not, that ZFS is something of a paradigm shift in disk storage. One telling hint is that you're having to calculate the "Total" capacity of the filesystem. ZFS itself doesn't have this concept (at the filesystem level), and indeed, there is no size property that can be accessed by the zfs list utility.

Another way to put it: take a generic 4TB drive. What is its "capacity"? The subset of your users who call you asking about disk space will probably say "four terabytes." But in ZFS, that's a minimum number, and is true only in the case where the data is incompressible. Even using the old lz4 compression algorithm, I have filesystems that exceed 3x compression, one reaching almost 6x. At that rate the "4 TB" drive's capacity would be between 12 and 24 terabytes. With zstd compression, the compression ratios can go to double digits.

So in ZFS filesystems, there is no such thing as "disk capacity" except in the context of the actual data that is going to be stored. The disk capacity literally changes as the data is written to the disk, because while the filesystem's used number increases by 10G, the avail may drop by only 5G, if the data is 2x compressible. Storing that 10G of data caused your "capacity" to go up by 5 GB. This is why your users are confused. They don't understand that a drive with an nominal capacity of 4TB could increase to a capacity of 8TB, if the data stored on it is 2:1 compressible.

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. That is what I do in my uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the hard numbers of bits that are used or not used on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

I am not qualified to say for sure, but the root problem of Samba not being able to determine sane numbers may lie in a defect somewhere in the upstream integration of ZFS and Samba. At the very least, Samba ought to be able to find the ZFS filesystem's used and avail properties, as you are trying to do by hand. All my systems figure this out just fine on their own.

But at a more fundamental level, your post suggests that you realize, though your users may not, that ZFS is something of a paradigm shift in disk storage. One telling hint is that you're having to calculate the "Total" capacity of the filesystem. ZFS itself doesn't have this concept (at the filesystem level), and indeed, there is no size property that can be accessed by the zfs list utility.

Another way to put it: take a generic 4TB drive. What is its "capacity"? The subset of your users who call you asking about disk space will probably say "four terabytes." But in ZFS, that's the physical capacity, the amount of hard disk space, a minimum number, and is true only in the case where the data is incompressible. If your data is compressible, then the total logical capacity of the drive (which may be what your users think of when they ask "How much more storage do we have?") is likely higher than 4 TB. Even using the old lz4 compression algorithm, I have filesystems that exceed 3x compression, one reaching almost 6x. At that rate the "4 TB" drive's capacity would be between 12 and 24 terabytes. With zstd compression, the compression ratios can go to double digits.

The difficulty is that, while ZFS knows what the compression ratio of the stored data is (used vs. logicalused), there is no way to determine how compressible the "available" space is until data is actually written to it. In other words, total=used+avail breaks because although used is known, avail is only a minimum. total could actually be more than used+avail, depending on what your future data storage patterns are.

In ZFS filesystems, there is no such thing as "disk capacity" except in the context of the actual data that is going to be stored. When a filesystem with a known available value stores additional data, say a file 10G in size, it could compress at 2:1 to consume only 5G of the actual disk. So saving a compressible 10G file could cause the logical capacity to go up by 5GB. Storing that 10G of data caused your apparent "capacity" to go up by 5 GB. This is why your users are confused. They don't understand that a drive with a nominal physical capacity of 4TB could increase to a logical capacity of 8TB of actual user data, if the user data is 2:1 compressible. Tracking the amount of available or total space in a compressible filesystem is always going to be a shifting target.

IMO you're on the right track when you say that you want to shift to reporting disk space metrics at the pool level. Even if you report the filesystems "hard" numbers from used and available

That is what I do in my uptime monitoring system, because regardless of the compression level, the output from zpool always reflects the hard numbers of bits that are used or not used on the drive. Indeed, at the pool level is where we finally find a "Size" property, something ZFS filesystems don't have. The amount used and amount available are slightly less meaningful, notably because they reflect RAID-Z parity, but the ratio of used/size is generally accurate:

deleted 17 characters in body
Source Link
Jim L.
  • 8.8k
  • 1
  • 15
  • 29

At the filesystem level, the used vs. available numbers will always be squishy, because the data itself is squishy. ZFS filesystems can be (and often are) configured to compress data on-the-fly, without the user being aware of that. So a user might see a volume with 100G available, save a compressible compressible 10G file, and then be baffled when instead of dropping to 90G free free, the volume still has 95G free. They'll wonder how the server's disk array array seemingly increased by 5GBstored a 10G file, based onbut the total of "used + available"available" space decreased by only 5GB." It's It's because the compression is transparent to the user.

At the filesystem level, the used vs. available numbers will always be squishy, because the data itself is squishy. ZFS filesystems can be (and often are) configured to compress data on-the-fly, without the user being aware of that. So a user might see a volume with 100G available, save a compressible 10G file, and then be baffled when instead of dropping to 90G free, the volume still has 95G free. They'll wonder how the server's disk array seemingly increased by 5GB, based on the total of "used + available." It's because the compression is transparent to the user.

At the filesystem level, the used vs. available numbers will always be squishy, because the data itself is squishy. ZFS filesystems can be (and often are) configured to compress data on-the-fly, without the user being aware of that. So a user might see a volume with 100G available, save a compressible 10G file, and then be baffled when instead of dropping to 90G free, the volume still has 95G free. They'll wonder how the server's disk array seemingly stored a 10G file, but the "available" space decreased by only 5GB. It's because the compression is transparent to the user.

deleted 17 characters in body
Source Link
Jim L.
  • 8.8k
  • 1
  • 15
  • 29
Loading
Source Link
Jim L.
  • 8.8k
  • 1
  • 15
  • 29
Loading