If your server is well configured, most of the storage IO should be write IO and all read IO except the (big) first one should be done from ram: your server should nearly only makes write IO to store to disk blocks which where modify in ram
As postgreSQL constantly makes some small write IO (depending of its page size), it would not be surprising that all postgreSQL datafiles would have an high fragmentation level
I did notice the same kind of "issue" (making lost more than 28% on a terabytes BTRFS filesystem) in a configuration where I have VM stored on a big BTRFS filesystem and each VM are using COW VMDK files as disk and some VM where running databases (specially MariaDB / postgreSQL)
The way I recover most of this space was to run :
$ sudo btrfs balance start -musage=100 -dusage=100 /mnt/vm $ sudo btrfs filesystem defrag -r -f -v /mnt/vm
and to run a balance again :
$ sudo btrfs balance start -musage=100 -dusage=100 /mnt/vm
This way I recover the most part of the "lost" space
Please also note that before all you should read :
btrfs filesystem
Here are the results in my case:
Real data (original and last state):
$ sudo btrfs fi du -s /mnt/vm Total Exclusive Set shared Filename 669.62GiB 669.22GiB 401.46MiB /mnt/vm
BTRFS filesystem usage (original state)
$ sudo btrfs fi usage /mnt/vm Overall: Device size: 1000.00GiB Device allocated: 955.07GiB Device unallocated: 44.93GiB Device missing: 0.00B Device slack: 0.00B Used: 950.71GiB Free (estimated): 46.74GiB (min: 24.28GiB) Free (statfs, df): 46.74GiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data,single: Size:949.01GiB, Used:947.19GiB (99.81%) /dev/mapper/vgu2nvme-lvvm 949.01GiB Metadata,DUP: Size:3.00GiB, Used:1.76GiB (58.56%) /dev/mapper/vgu2nvme-lvvm 6.00GiB System,DUP: Size:32.00MiB, Used:144.00KiB (0.44%) /dev/mapper/vgu2nvme-lvvm 64.00MiB Unallocated: /dev/mapper/vgu2nvme-lvvm 44.93GiB
Last state after all operation (the first balance only recover 10 GiB):
$ sudo btrfs fi usage /mnt/vm Overall: Device size: 1000.00GiB Device allocated: 711.07GiB Device unallocated: 288.93GiB Device missing: 0.00B Device slack: 0.00B Used: 708.02GiB Free (estimated): 291.72GiB (min: 147.26GiB) Free (statfs, df): 291.72GiB Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Multiple profiles: no Data,single: Size:709.01GiB, Used:706.21GiB (99.61%) /dev/mapper/vgu2nvme-lvvm 709.01GiB Metadata,DUP: Size:1.00GiB, Used:926.33MiB (90.46%) /dev/mapper/vgu2nvme-lvvm 2.00GiB System,DUP: Size:32.00MiB, Used:144.00KiB (0.44%) /dev/mapper/vgu2nvme-lvvm 64.00MiB Unallocated: /dev/mapper/vgu2nvme-lvvm 288.93GiB
So not perfect (still about 4% of lost space is not recoverd) but the best I achieve to do !
NB: All those operations were done online with filesystem mounted and with about ~ 20 running VM on it Maybe the only way to recover the last 4% would be to do a cold copy of data to another newly BTRFS formated filesystem (= stopping ~20 production VM and doing cp -a )...
So if someone know how to recover the last 4% lost space without copying the data to another filesystem, it would help a lot