I understand the standard explanation for why B-trees are used in databases: they minimize disk seeks by packing many keys into each node, keeping the tree shallow (3-4 levels), and enabling efficient sequential scans.
I'm confused because the db storage engine (e.g., InnoDB) interacts with data through the OS file system abstraction. The OS makes no guarantees about where data is physically written on disk. (it has its own algorithms for minimizing disk head movement etc.)
My doubts:
The OS doesn't write sequentially to disk in the order the storage engine wants it to. For example, Windows doesn't allow easy volume resizing because data placement is unpredictable. How can the storage engine's "page optimization" actually translate to fewer physical disk seeks if it has no control over where pages end up on the physical medium?
The file system API abstracts away block storage details. If B-trees are designed for disk seek optimization specifically, but the storage engine can't control or even know about physical disk layout, isn't this optimization working at the wrong level of abstraction?
Despite, B-trees clearly work well in practice. Is it that the OS does try to keep sequentially-written data physically close together (maybe not perfectly)? Or its more about reducing the number of I/O requests rather than their physical locality?
Related:
Why do we need a separate datastructure like B-Tree for database and file system?
I read the standard explanations about tree height and fanout, but they all assume control over physical layout