34

I have a very high density virtualized environment with containers, so I'm trying to make each container really small. "Really small" means 87 MB on base Ubuntu 14.04 (Trusty Tahr) without breaking up the package manager compatibility.

So I use LVM as a backing storage for my containers and recently I found very strange numbers. Here they are.

Let's create a 100 MiB (yeah, power of 2) logical volume.

sudo lvcreate -L100M -n test1 /dev/purgatory 

I'd like to check the size, so I issue sudo lvs --units k

test1 purgatory -wi-a---- 102400.00k 

Sweet, this is really 100 MiB.

Now let's make an ext4 filesystem. And of course, we remember -m 0 parameter, which prevents space waste.

sudo mkfs.ext4 -m 0 /dev/purgatory/test1 mke2fs 1.42.9 (4-Feb-2014) Filesystem label= OS type: Linux Block size=1024 (log=0) Fragment size=1024 (log=0) Stride=0 blocks, Stripe width=0 blocks 25688 inodes, 102400 blocks 0 blocks (0.00%) reserved for the super user First data block=1 Maximum filesystem blocks=67371008 13 block groups 8192 blocks per group, 8192 fragments per group 1976 inodes per group Superblock backups stored on blocks: 8193, 24577, 40961, 57345, 73729 Allocating group tables: done Writing inode tables: done Creating journal (4096 blocks): done Writing superblocks and filesystem accounting information: done 

Sweet and clean. Mind the block size - our logical volume is small, so mkfs.ext4 decided to make a 1 KiB sized block, not the usual 4 KiB.

Now we will mount it.

sudo mount /dev/purgatory/test1 /mnt/test1 

And let's call df without parameters (we would like to see 1 KiB-blocks)

/dev/mapper/purgatory-test1 95054 1550 91456 2% /mnt/test1 

Wait, oh shi~

We have 95054 blocks total. But the device itself has 102400 blocks of 1 KiB. We have only 92.8% of our storage. Where are my blocks, man?

Let's look at it on a real block device. A have a 16 GiB virtual disk, 16777216 blocks of 1K, but only 15396784 blocks are in df output. 91.7%, what is it?

Now follows the investigation (spoiler: no results)

  1. Filesystem could begin not at the beginning of the device. This is strange, but possible. Luckily, ext4 has magic bytes, let's check their presence.

    sudo hexdump -C /dev/purgatory/test1 | grep "53 ef"

This shows superblock:

00000430 a9 10 e7 54 01 00 ff ff 53 ef 01 00 01 00 00 00 |...T....S.......| 

Hex 430 = Dec 1072, so somewhere after first kilobyte. Looks reasonable, ext4 skips first 1024 bytes for oddities like VBR, etc.

  1. This is journal!

No, it is not. Journal take space from Available if df output.

  1. Oh, we have dump2fs and could check the sizes there!

... a lot of greps ...

sudo dumpe2fs /dev/purgatory/test1 | grep "Free blocks" 

Ouch.

Free blocks: 93504 Free blocks: 3510-8192 Free blocks: 8451-16384 Free blocks: 16385-24576 Free blocks: 24835-32768 Free blocks: 32769-40960 Free blocks: 41219-49152 Free blocks: 53249-57344 Free blocks: 57603-65536 Free blocks: 65537-73728 Free blocks: 73987-81920 Free blocks: 81921-90112 Free blocks: 90113-98304 Free blocks: 98305-102399 

And we have another number. 93504 free blocks.

The question is: what is going on?

  • Block device: 102400k (lvs says)
  • Filesystem size: 95054k (df says)
  • Free blocks: 93504k (dumpe2fs says)
  • Available size: 91456k (df says)
2
  • That's why I still use ext2 for small partitions. Commented Feb 20, 2015 at 13:52
  • @frostschutz ext2 looks reasonable here, sure Commented Feb 20, 2015 at 13:55

3 Answers 3

33

Try this: mkfs.ext4 -N 104 -m0 -O ^has_journal,^resize_inode /dev/purgatory/test1

I thinks this does let you understand "what is going on".

-N 104 (set the number of iNodes you filesystem should have)

  • every iNode "costs" usable space (128 Byte)

-m 0 (no reserved blocks)
-O ^has_journal,^resize_inode (deactivate the features has_journal and resize_inode

  • resize_inode "costs" free space (most of the 1550 1K-Blocks/2% you see in your df - 12K are used for the "lost+found" folder)
  • has_journal "costs" usable space (4096 1K-Blocks in your case)

We get 102348 out of 102400, another 52 blocks unusable (if we have deleted the "lost+found" folder). Therefore we dive into dumpe2fs:

Group 0: (Blocks 1-8192) [ITABLE_ZEROED] Checksum 0x5ee2, unused inodes 65533 Primary superblock at 1, Group descriptors at 2-2 Block bitmap at 3 (+2), Inode bitmap at 19 (+18) Inode table at 35-35 (+34) 8150 free blocks, 0 free inodes, 1 directories, 65533 unused inodes Free blocks: 17-18, 32-34, 48-8192 Free inodes: Group 1: (Blocks 8193-16384) [BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x56cf, unused inodes 5 Backup superblock at 8193, Group descriptors at 8194-8194 Block bitmap at 4 (+4294959107), Inode bitmap at 20 (+4294959123) Inode table at 36-36 (+4294959139) 8190 free blocks, 6 free inodes, 0 directories, 5 unused inodes Free blocks: 8193-16384 Free inodes: 11-16 Group 2: (Blocks 16385-24576) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x51eb, unused inodes 8 Block bitmap at 5 (+4294950916), Inode bitmap at 21 (+4294950932) Inode table at 37-37 (+4294950948) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 16385-24576 Free inodes: 17-24 Group 3: (Blocks 24577-32768) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x3de1, unused inodes 8 Backup superblock at 24577, Group descriptors at 24578-24578 Block bitmap at 6 (+4294942725), Inode bitmap at 22 (+4294942741) Inode table at 38-38 (+4294942757) 8190 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 24577-32768 Free inodes: 25-32 Group 4: (Blocks 32769-40960) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x79b9, unused inodes 8 Block bitmap at 7 (+4294934534), Inode bitmap at 23 (+4294934550) Inode table at 39-39 (+4294934566) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 32769-40960 Free inodes: 33-40 Group 5: (Blocks 40961-49152) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x0059, unused inodes 8 Backup superblock at 40961, Group descriptors at 40962-40962 Block bitmap at 8 (+4294926343), Inode bitmap at 24 (+4294926359) Inode table at 40-40 (+4294926375) 8190 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 40961-49152 Free inodes: 41-48 Group 6: (Blocks 49153-57344) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x3000, unused inodes 8 Block bitmap at 9 (+4294918152), Inode bitmap at 25 (+4294918168) Inode table at 41-41 (+4294918184) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 49153-57344 Free inodes: 49-56 Group 7: (Blocks 57345-65536) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x5c0a, unused inodes 8 Backup superblock at 57345, Group descriptors at 57346-57346 Block bitmap at 10 (+4294909961), Inode bitmap at 26 (+4294909977) Inode table at 42-42 (+4294909993) 8190 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 57345-65536 Free inodes: 57-64 Group 8: (Blocks 65537-73728) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0xf050, unused inodes 8 Block bitmap at 11 (+4294901770), Inode bitmap at 27 (+4294901786) Inode table at 43-43 (+4294901802) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 65537-73728 Free inodes: 65-72 Group 9: (Blocks 73729-81920) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x50fd, unused inodes 8 Backup superblock at 73729, Group descriptors at 73730-73730 Block bitmap at 12 (+4294893579), Inode bitmap at 28 (+4294893595) Inode table at 44-44 (+4294893611) 8190 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 73729-81920 Free inodes: 73-80 Group 10: (Blocks 81921-90112) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x60a4, unused inodes 8 Block bitmap at 13 (+4294885388), Inode bitmap at 29 (+4294885404) Inode table at 45-45 (+4294885420) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 81921-90112 Free inodes: 81-88 Group 11: (Blocks 90113-98304) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0x28de, unused inodes 8 Block bitmap at 14 (+4294877197), Inode bitmap at 30 (+4294877213) Inode table at 46-46 (+4294877229) 8192 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 90113-98304 Free inodes: 89-96 Group 12: (Blocks 98305-102399) [INODE_UNINIT, ITABLE_ZEROED] Checksum 0x9223, unused inodes 8 Block bitmap at 15 (+4294869006), Inode bitmap at 31 (+4294869022) Inode table at 47-47 (+4294869038) 4095 free blocks, 8 free inodes, 0 directories, 8 unused inodes Free blocks: 98305-102399 Free inodes: 97-104 

and count the used blocks (for Backup superblock, Group descriptors, Block bitmap, Inode bitmap and Inode table) or we grep and count:

LANG=C dumpe2fs /dev/mapper/vg_vms-test1 | grep ' at ' | grep -v ',' | wc -l 

which gives us the count of lines which have a single block (in our example) and

LANG=C dumpe2fs /dev/mapper/vg_vms-test1 | grep ' at ' | grep ',' | wc -l 

which gives us the count of lines which have two blocks (in our example).

So we have (in our example) 13 lines with one block each and 19 lines with two blocks each.

13+19*2 

which gives us 51 blocks which are in use by ext4 itself. Finally there is only one block left. The block 0, which are the skipped 1024 Bytes at the beginning for things like the boot sector.

6
  • And if journal takes only 4096k, I don't have this number (95054 - 4096) != 91456? Commented Feb 20, 2015 at 14:03
  • All numbers here are in k, so 95054k total - 4096k of journal != 91456k available. Commented Feb 20, 2015 at 14:05
  • 2
    df on fs with journal: 95054k -- df on fs without jorunal 99150k -- and don't mix "usable" and "free" space. Commented Feb 20, 2015 at 14:08
  • Some filesystems, e.g. xfs, dynamically allocate space for inodes as needed. You might want to try xfs and btrfs, if you're curious. mkfs.xfs -l size=512 -d agcount=1 will make a filesystem with the absolute minimum log (aka journal) size, but write performance might suffer. I don't think the XFS code support operating without a log. Possibly read-only, to support cases where an external log device is broken. (also, agcount=1 is probably another terrible idea for write performance, esp. parallel. And allocation group headers are probably small, too.) Commented Feb 21, 2015 at 11:07
  • Got curious and tried XFS. If there is a combination of options for Linux XFS that will let the minimum log size go down to the absolute minimum of 512 blocks, IDK what it is. mkfs.xfs -d agcount=1 on a 100MiB partition made a FS of 95980kiB, with 5196k used, 90784k available. The default agcount is 4, and the default log size is 1605 blocks (also the minimum). So XFS does use as small a log as it's willing to let you specify, for small FSes. Commented Feb 21, 2015 at 11:22
20

The short answer:

Not all space on the block device becomes available space for your data: some of the raw space is needed for file-system internals, the behind the scenes bookkeeping.

That bookkeeping includes the super block, block group descriptors, block and inode bitmaps, and the inode table. In addition copies of the super block for backup/recovery purposes are created at a number of locations. A long read about the EXT4 file system internals can be found on ext4.wiki.kernel.org.

Since EXT4 is a journaled file-system that takes up some space as well.

Additionally some space is reserved for future expansions of the file-system.

The long answer:

I have recreated your scenario on one of my test systems:

lvcreate -L 100M -n test MyVG mkfs.ext4 -b 1024 /dev/MyVG/test 

Then before even mounting the file-system a dumpe2fs shows:

Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 25688 Block count: 102400 Reserved block count: 5120 Free blocks: 93504 Free inodes: 25677 First block: 1 Block size: 1024 Fragment size: 1024 Reserved GDT blocks: 256 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 1976 Inode blocks per group: 247 Flex block group size: 16 Filesystem created: Fri Feb 20 13:20:54 2015 Last mount time: n/a Last write time: Fri Feb 20 13:20:55 2015 ... Journal size: 4096k ... 

and after mounting:

df /tmp/test/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/MyVG-test 99150 5646 88384 7% /tmp/test 

So what does df show us? From the 102400 blocks of the raw storage device capacity 99150 1K blocks are visible to the file-system, meaning that 3250 1-Kilobyte blocks of raw storage space have become unusable for actual data storage.

Where did those blocks go to? Scrolling down in the dumpe2fs output shows exactly where:

Group 0: (Blocks 1-8192) [ITABLE_ZEROED] Checksum 0x0d67, unused inodes 1965 Primary superblock at 1, Group descriptors at 2-2 Reserved GDT blocks at 3-258 Block bitmap at 259 (+258), Inode bitmap at 275 (+274) Inode table at 291-537 (+290) 4683 free blocks, 1965 free inodes, 2 directories, 1965 unused inodes Free blocks: 3510-8192 Free inodes: 12-1976 

1 block (block #0) The first 1024 bytes are skipped to allow for the installation of x86 boot sectors and other oddities.
1 block is occupied by the Primary super block.
1 block contains the Group descriptors.
256 blocks are reserved for the Group Descriptor Table to allow future resizing of the filesystem. 16 blocks are assigned for the block bitmap.
16 blocks are assigned for the inode bitmap.
246 blocks are assigned for the inode table.

That already accounts for 537 of the 3250 missing blocks. An ext4 file system is split into a series of block groups and scrolling down further shows a similar allocation of raw storage capacity to file-system internals in the other block groups:

Group 1: (Blocks 8193-16384) [INODE_UNINIT, ITABLE_ZEROED] Checksum 0x0618, unused inodes 1976 Backup superblock at 8193, Group descriptors at 8194-8194 Reserved GDT blocks at 8195-8450 Block bitmap at 260 (+4294959363), Inode bitmap at 276 (+4294959379) Inode table at 538-784 (+4294959641) 7934 free blocks, 1976 free inodes, 0 directories, 1976 unused inodes Free blocks: 8451-16384 Free inodes: 1977-3952 Group 2: (Blocks 16385-24576) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED] Checksum 0xcfd3, unused inodes 1976 Block bitmap at 261 (+4294951172), Inode bitmap at 277 (+4294951188) Inode table at 785-1031 (+4294951696) 8192 free blocks, 1976 free inodes, 0 directories, 1976 unused inodes Free blocks: 16385-24576 Free inodes: 3953-5928 Group .... 

Now back to the df output:

df /tmp/test/ Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/MyVG-test 99150 5646 88384 7% /tmp/test 

The reason that on that fresh file-system already 7% of the capacity is marked as in use is:

99150 (the size of the file-system) MINUS 5120 (the reserved block count) MINUS 5646 (used blocks, 4096 of which are from the Journal (again part of the dumpe2fs` output))
= 88384

The free block count in dumpe2fs is the available size of the file-system minus the actual usage (and doesn't take take the reserved blocks into account) so 99150 - 5646 = 93504.

1

Not an answer to the question, but I got curious so I imagine other people will. Since I had a liveCD booted already, and had a hard drive I could mess with without worrying about typos damaging anything, I went ahead and tested.

I made partitions with all of the FSes that Ubuntu 14.10 ships an mkfs for, on 100MiB partitions. (except minix, which only supports 64MiB, and bfs, which is some SCO thing I've never heard of.)

First I looked at df -k available space (with default mkfs settings), then I dded /dev/zero to a file on each FS to make sure they could be filled all the way up. (i.e. check that the claimed available space was really available.)
for i in /media/ubuntu/small-*;do sudo dd if=/dev/zero of="$i/fill" bs=16k;done

* FS: empty `df -k` : non-zero `df -k` when full (false bottom) * jfs: 101020k * fat32:100808k : 4 * ntfs: 99896k * btrfs: 98276k : 4428 * ext2: 92480k * xfs: 90652k : 20 * ext4: 86336k * ext3: 88367k * reiserfs(v3): 69552k 

Why does btrfs have so much unusable space? Maybe for metadata? well nope:

$ for i in /media/ubuntu/small-*;do sudo touch "$i/touched";done touch: cannot touch ‘/media/ubuntu/small-btrfs/touched’: No space left on device touch: cannot touch ‘/media/ubuntu/small-reiser/touched’: No space left on device 

Both tree-based filesystems can't pack an empty file in anywhere, but all the others can.

Or just look at how big a file you can create:

$ ls -SdlG --block-size=1k /media/ubuntu/small-*/* -rw-r--r-- 1 root 101020 Feb 21 11:55 /media/ubuntu/small-jfs/fill -rw-r--r-- 1 ubuntu 100804 Feb 21 11:55 /media/ubuntu/small-fat/fill -rw------- 1 ubuntu 99848 Feb 21 11:55 /media/ubuntu/small-ntfs/fill -rw-r--r-- 1 root 97216 Feb 21 11:55 /media/ubuntu/small-ext2/fill -rw-r--r-- 1 root 93705 Feb 21 11:27 /media/ubuntu/small-btrfs/foo -rw-r--r-- 1 root 93120 Feb 21 11:55 /media/ubuntu/small-ext3/fill -rw-r--r-- 1 root 91440 Feb 21 11:55 /media/ubuntu/small-ext/fill -rw-r--r-- 1 root 90632 Feb 21 11:55 /media/ubuntu/small-xfs/fill -rw-r--r-- 1 root 69480 Feb 21 11:55 /media/ubuntu/small-reiser/fill drwx------ 2 root 12 Feb 21 11:33 /media/ubuntu/small-ext2/lost+found drwx------ 2 root 12 Feb 21 11:43 /media/ubuntu/small-ext3/lost+found drwx------ 2 root 12 Feb 21 11:29 /media/ubuntu/small-ext/lost+found 

(I called my ext4 partition "small-ext" because I wasn't planning to go nuts and make every filesystem. so ext=ext4 here. NOT the original pre-ext2 ext.)

And df -k output after removing them again:

/dev/sdd6 95980 5328 90652 6% /media/ubuntu/small-xfs /dev/sdd7 95054 1550 86336 2% /media/ubuntu/small-ext /dev/sdd5 102400 93880 101020 96% /media/ubuntu/small-btrfs /dev/sdd8 101168 101168 0 100% /media/ubuntu/small-jfs /dev/sdd9 99150 1550 92480 2% /media/ubuntu/small-ext2 /dev/sdd10 102392 32840 69552 33% /media/ubuntu/small-reiser /dev/sdd11 100808 1 100808 1% /media/ubuntu/small-fat /dev/sdd12 102396 2548 99848 3% /media/ubuntu/small-ntfs /dev/sdd13 95054 1567 88367 2% /media/ubuntu/small-ext3 

(jfs went back to 1% used after I removed "touched" as well. Either there was a time delay, or it took another write to get the available size to update.)

Anyway, I think that's about it for my curiosity.

1
  • (I'm currently using btrfs on my desktop. It does have a delay after unlinking a file before the size available shown in df updates.) Commented May 22, 2022 at 4:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.