TPU Cluster Director Overview

TPU Cluster Director is designed to give you direct, reservation-based control over your Google Cloud AI accelerators. For Cloud TPU, Cluster Director foundational capabilities provide a new tier of service that goes beyond a multi-tenant offering to deliver physically isolated TPU capacity:

Dedicated, physically co-located capacity: We now offer dense, co-located TPU reservations, giving you complete control over your hardware for optimal network performance and workload scheduling.
Advanced maintenance and control: You get precise control over maintenance events, with the ability to target specific VMs, cubes, Pods, or entire reservations, and to manage the sequence and pace of these events to minimize business impact.
Topology-aware scheduling: You get a complete view of the physical topology, health, and utilization of the hardware, enabling smarter, performance-driven workload placement.

Cluster Director foundations is fully integrated with Google Kubernetes Engine. This integration offers several features to enhance large-scale AI workloads:

Improved efficiency, fault tolerance, and resiliency - provides a robust environment for demanding AI tasks.
Topology-aware node pools and workload placement. - co-located dense reservations let you target specific pods or cubes. This enables finer-grained workload scheduling.

With Cluster Director foundations on GKE, you benefit from better utilization, higher performance and scalability of your workloads, improved goodput and reliability, and comprehensive observability into physical capacity (from hosts all the way to GKE clusters).

TPUs Cluster Director foundations on GKE is available through the new All Capacity mode reservation.

All Capacity mode

Previously, TPU capacity was offered through a "managed" mode, where Google automatically replaces any faulty TPU machines but holds back part of your reserved capacity to help ensure your TPU slices have the necessary resources to restart. Google now introduces a new capacity mode for TPU known as "All Capacity" mode. In this capacity mode, you have full visibility into the TPU hardware topology, utilization status, and health status of your reserved capacity. You also have access to your full reserved capacity but you are responsible for managing failures and planned maintenance.

Key features of All Capacity Mode include:

Full control and visibility: you have complete control over your reserved capacity and full visibility into your hardware health and topology. This means you can see all available capacity, including holdbacks, and manage machine failures directly.
Dedicated capacity: you can access dedicated capacity that is always available for your AI workloads. With full capacity and no holdbacks, you get greater predictability and higher allocation, meaning you can utilize every bit of your TPU capacity reserved. Now, your holdback capacity is also accessible to run your lower priority workloads.
Optimized performance: TPU All Capacity mode provides dense co-location of large accelerator resources with ultra-low latency networking, which is critical for large-scale, tightly-coupled ML and HPC workloads. The architecture is optimized for maximum performance in both training and inference workloads.

Supported TPU Generations

TPU All Capacity mode and features are available on Trillium (tpu v6e), TPU Ironwood (tpu7x), and future TPU generations. Support for older TPU generations is not planned.

TPU Cluster Director terminology

Cluster Director topology concepts consists of four levels: Cluster, Block, Sub-block, and Host. A cluster is a Google deployment unit of physical TPU capacity in pod multiples. All TPU capacity in a cluster is within one zone. A TPU reservation in the All Capacity mode is always within one Cluster. For TPUs, the rest of the topology concepts map to physical components as shown in the following tables.

Trillium

Topology concepts	Trillium	Cores	Chips	Hosts
---	Chip	1	1	N/A
Host	Host	8	8	1
Sub-block	Trillium Pod	256	256	32
Block	Multiple Trillium Pods (up to 16) in a reservation	Up to 4096	Up to 4096	Up to 512
---	Allowed slices in a sub-block	1x1, 2x2, 2x4, 4x4, 4x8, 8x8, 8x16 and 16x16
---	One reservation can have multiple blocks and each block can have 1 to 16 Trillium Pods

For more information about Trillium slice sizes, see Trillium supported configurations.

Ironwood

Topology concepts	Ironwood	Cores	Chips	Hosts
---	Chip	2	1	---
Host	Host	8	4	1
SubBlock	Cube	128	64	16
Block	Multiple Ironwood cubes up to a full Pod		Up to 9216 (144 cubes)	Up to 2304
---	Allowed Slice in a Block: Examples	1x1x1, 2x2x1, 2x2x2, 2x4x4, 4x4x4, 8x8x8, 16x8x8, 16x16x8, and 12x24x24 (and many more)
---	A reservation can have one or more Ironwood cubes, up to a full Ironwood Pod.

For more information about Ironwood slice sizes, see TPUv7x supported configurations.