You assumption about adjacency is correct.
If we use TPC-H as an example: Clustering the LINEITEMS table on on ORDERID will locate all order lines belonging to the same LINEITEM physically adjacent on disk. This speeds up queries that fetch all order lines for a given ORDERID. Clustering on the foreign key to the parent also allow fast merge joins between the child and parent.
There are a few downsides to the clustering approach:
- The entire table must be kept sorted on disk. If you are expecting a great many inserts with
ORDERID not being sequentially generated, page splits will be more expensive. This is something you can throw hardware at. - If
ORDERID is generated sequentially, you will create a hotspot at the end of the table. In some database engines (For example SQL Server) this is a problem at high insert speed. In SQL Server, this typically kicks in around 5K-10K inserts/sec. - The cluster index keys either have to be unique (ex:
ORDERID, LINENUMBER) or padded with some hidden column to make them unique. Since the composite cluster key must be present in all other indexes, this makes the secondary, non-clustered indexes larger. - Storing the table clustered will force a B-tree traversal when you want to locate data via a secondary index (unless the secondary index is covering the query). If you instead kept the table as a heap, all other indexes would only have an 8B overhead and your B-tree traversals are cut in half.
The vast majority of cases, you will want to cluster both the parent and the child on the same leading key. But if you expect the child table to be accessed via many different indexes - it may be worth considering the alternatives.