How to Understand Google's New TPU 8 Series for Massive AI Training and Inference

Imagine you are building a giant robot that needs a powerful brain. Google just released two new specialized brains called TPU 8t and TPU 8i. While one focuses on learning huge amounts of information, the other specializes in making lightning-fast decisions. Let’s explore how these modern chips power the internet today.

Google recently announced its eighth-generation Tensor Processing Units, known as the TPU 8 series, during the Cloud Next event. This release is unique because, for the first time in a decade, Google has developed two separate chip designs to handle different types of artificial intelligence tasks. The TPU 8t is specifically designed for the training phase, where a model learns from data, while the TPU 8i is optimized for inference, which is the process of the AI actually answering questions or generating content. This split is a response to the growing complexity of modern AI systems like Large Language Models (LLMs).

The manufacturing of these chips also represents a major shift in the industry. For many years, Broadcom was Google’s only partner for designing these silicon chips. However, MediaTek has now joined the program to help design the TPU 8i inference chip. Both versions of the TPU 8 are built using the advanced TSMC N3 process family and incorporate HBM3E (High Bandwidth Memory). While individual chips from competitors like Nvidia or AMD might have more raw power per socket, Google’s strategy focuses on how these chips work together in massive groups called superpods. A single TPU 8t superpod can contain 9,600 chips, allowing them to process incredible amounts of data simultaneously.

When comparing technical specifications, the TPU 8t offers 12.6 FP4 PFLOPs of performance with 216 GB of HBM3e memory. In contrast, the TPU 8i provides 10.1 FP4 PFLOPs but features a larger memory capacity of 288 GB. You might wonder why the inference chip has more memory. This is because inference requires storing large amounts of “context” so the AI can remember what you previously said in a conversation. Google purposefully chose HBM3E memory over the newer HBM4 to keep costs lower and ensure they can produce enough chips to meet the high demand from customers like Apple and Meta.

The TPU 8i uses a very special internal layout called “Boardfly” topology. In older chips, data had to travel through many “hops” to get from one chip to another, which slowed things down. Boardfly uses a three-tier hierarchy that reduces the distance data needs to travel by 56%. This is extremely helpful for “Mixture-of-Experts” models, where different parts of the AI brain need to talk to each other very quickly to solve a problem. Furthermore, the TPU 8i includes a new feature called the Collectives Acceleration Engine (CAE). This engine takes over the boring synchronization tasks, letting the main part of the chip focus entirely on thinking, which makes the whole system five times faster at responding to users.

On the other hand, the TPU 8t is built for the heavy lifting of training. It keeps a technology called SparseCore, which helps the chip find specific pieces of information in a massive library of data very efficiently. It also introduces a feature called TPUDirect RDMA. This allows the chip to pull data directly from storage without having to ask the main computer processor (CPU) for permission every time. This makes accessing stored data ten times faster than before. Both chips now use Google’s own Arm-based Axion CPUs instead of traditional x86 processors, which helps everything run more smoothly and uses less electricity.

To start using these powerful AI chips for your own projects, you can follow these steps within the Google Cloud platform:

Access the Google Cloud Console by logging into your registered account and selecting your active project from the dashboard.
Navigate to the search bar at the top and type “Compute Engine,” then select the “TPUs” option from the dropdown menu to enter the TPU management page.
Click on the “Create TPU Node” button located at the top of the screen to begin the configuration process for your new hardware.
In the “TPU Type” dropdown menu, look for the “v8” options and choose either “tpu-v8t” for training or “tpu-v8i” for inference depending on your specific needs.
Select your desired “TPU software version” to ensure it matches the machine learning framework you are using, such as TensorFlow or PyTorch.
Configure the “Network” settings by choosing a VPC network that allows your other cloud resources to communicate with the TPU node securely.
Click the “Create” button at the bottom of the page and wait a few minutes for the status indicator to turn green, indicating your TPU is ready for use.

The introduction of the TPU 8 series marks a pivotal moment for Google Cloud, offering a specialized approach to AI compute that balances raw power with architectural efficiency. By providing distinct chips for training and inference, Google ensures that developers can choose the most cost-effective solution for their specific AI projects. Whether you are building the next generation of large language models or deploying fast-response chatbots, the TPU 8 ecosystem provides the necessary scale and performance. We recommend that developers begin experimenting with the TPU 8i for inference-heavy tasks to maximize their budget and performance.