TPU Cluster Director Overview
TPU Cluster Director is designed to give you direct, reservation-based control over your Google Cloud AI accelerators. For Cloud TPU, Cluster Director foundational capabilities provide a new tier of service that goes beyond a multi-tenant offering to deliver physically isolated TPU capacity:
- Dedicated, physically co-located capacity: We now offer dense, co-located TPU reservations, giving you complete control over your hardware for optimal network performance and workload scheduling.
- Advanced maintenance and control: You get precise control over maintenance events, with the ability to target specific VMs, cubes, Pods, or entire reservations, and to manage the sequence and pace of these events to minimize business impact.
- Topology-aware scheduling: You get a complete view of the physical topology, health, and utilization of the hardware, enabling smarter, performance-driven workload placement.
Cluster Director foundations is fully integrated with Google Kubernetes Engine. This integration offers several features to enhance large-scale AI workloads:
- Improved efficiency, fault tolerance, and resiliency - provides a robust environment for demanding AI tasks.
- Topology-aware node pools and workload placement. - co-located dense reservations let you target specific pods or cubes. This enables finer-grained workload scheduling.
With Cluster Director foundations on GKE, you benefit from better utilization, higher performance and scalability of your workloads, improved goodput and reliability, and comprehensive observability into physical capacity (from hosts all the way to GKE clusters).
TPUs Cluster Director foundations on GKE is available through the new All Capacity mode reservation.
All Capacity mode
Previously, TPU capacity was offered through a "managed" mode, where Google automatically replaces any faulty TPU machines but holds back part of your reserved capacity to help ensure your TPU slices have the necessary resources to restart. Google now introduces a new capacity mode for TPU known as "All Capacity" mode. In this capacity mode, you have full visibility into the TPU hardware topology, utilization status, and health status of your reserved capacity. You also have access to your full reserved capacity but you are responsible for managing failures and planned maintenance.
Key features of All Capacity Mode include:
- Full control and visibility: you have complete control over your reserved capacity and full visibility into your hardware health and topology. This means you can see all available capacity, including holdbacks, and manage machine failures directly.
- Dedicated capacity: you can access dedicated capacity that is always available for your AI workloads. With full capacity and no holdbacks, you get greater predictability and higher allocation, meaning you can utilize every bit of your TPU capacity reserved. Now, your holdback capacity is also accessible to run your lower priority workloads.
- Optimized performance: TPU All Capacity mode provides dense co-location of large accelerator resources with ultra-low latency networking, which is critical for large-scale, tightly-coupled ML and HPC workloads. The architecture is optimized for maximum performance in both training and inference workloads.
Supported TPU Generations
TPU All Capacity mode and features are available on Trillium (tpu v6e), TPU Ironwood (tpu7x), and future TPU generations. Support for older TPU generations is not planned.
TPU Cluster Director terminology
Cluster Director topology concepts consists of four levels: Cluster, Block, Sub-block, and Host. A cluster is a Google deployment unit of physical TPU capacity in pod multiples. All TPU capacity in a cluster is within one zone. A TPU reservation in the All Capacity mode is always within one Cluster. For TPUs, the rest of the topology concepts map to physical components as shown in the following tables.
Trillium
| Topology concepts | Trillium | Cores | Chips | Hosts |
|---|---|---|---|---|
| --- | Chip | 1 | 1 | N/A |
| Host | Host | 8 | 8 | 1 |
| Sub-block | Trillium Pod | 256 | 256 | 32 |
| Block | Multiple Trillium Pods (up to 16) in a reservation | Up to 4096 | Up to 4096 | Up to 512 |
| --- | Allowed slices in a sub-block | 1x1, 2x2, 2x4, 4x4, 4x8, 8x8, 8x16 and 16x16 | ||
| --- | One reservation can have multiple blocks and each block can have 1 to 16 Trillium Pods | |||
For more information about Trillium slice sizes, see Trillium supported configurations.
Ironwood
| Topology concepts | Ironwood | Cores | Chips | Hosts |
|---|---|---|---|---|
| --- | Chip | 2 | 1 | --- |
| Host | Host | 8 | 4 | 1 |
| SubBlock | Cube | 128 | 64 | 16 |
| Block | Multiple Ironwood cubes up to a full Pod | Up to 9216 (144 cubes) | Up to 2304 | |
| --- | Allowed Slice in a Block: Examples | 1x1x1, 2x2x1, 2x2x2, 2x4x4, 4x4x4, 8x8x8, 16x8x8, 16x16x8, and 12x24x24 (and many more) | ||
| --- | A reservation can have one or more Ironwood cubes, up to a full Ironwood Pod. |
For more information about Ironwood slice sizes, see TPUv7x supported configurations.