Google has announced its new A3 cloud supercomputer, which is now available in private preview.
The new powerhouse can be used to train Machine Learning (ML) models, continuing the tech giant’s recent push to offer cloud infrastructure for AI purposes, such as the new G2 (opens in new tab), the first cloud Virtual Machine (VM) to use the new NVIDIA L4 Tensor Core GPU.
In a blog post (opens in new tab), the company noted, “Google Compute Engine A3 supercomputers are purpose-built to train and serve the most demanding AI models that power today’s generative AI and large language model innovation.”
A2 vs. A3
The A3 uses the Nvidia H100 GPU, which is the successor to the popular A100, which was used to power the previous A2. It is also used to power ChatGPT, the AI writer that kickstarted the generative AI race when it launched in November last year.
The A3 is also the first VM where the GPUs will use Google’s custom-designed 200 Gbps VPUs, which allows for ten times the network bandwidth of the previous A2 VMs.
The A3 will also make use of Google’s Jupiter data center, which can scale to tens of thousands of interconnected GPUs and “allows for full-bandwidth reconfigurable optical links that can adjust the topology on demand.”
Google also claims that the “workload bandwidth… is indistinguishable from more expensive off-the-shelf non-blocking network fabrics, resulting in a lower TCO.” The A3 also “provides up to 26 exaFlops of AI performance, which considerably improves the time and costs for training large ML models. “
When it comes to inference workloads, which is the real work that generative AI performs, Google again makes another bold claim that the A3 achieves a 30x inference performance boost over the A2.
In addition to the eight H100s with 3.6 TB/s bisectional bandwidth between them, the other standout specs of the A3 include the next-generation 4th Gen Intel Xeon Scalable processors, and 2TB of host memory via 4800 MHz DDR5 DIMMs.
“Google Cloud’s A3 VMs, powered by next-generation NVIDIA H100 GPUs, will accelerate training and serving of generative AI applications,” said Ian Buck, vice president of hyperscale and high performance computing at NVIDIA.
In a complimentary announcement at Google I/O 2023 (opens in new tab), the company also said that generative AI support in Vertex AI will be available to more customers now, which allows for the building of ML models on fully-managed infrastructure that forgoes the need for maintenance.
Customers can also deploy the A3 on the Google Kubernetes Engine (GKE) and Compute Engine, which means they can get support on autoscaling and workload orchestration, as well as being entitled to automatic upgrades.
It seems that Google is taking the B2B approach when it comes to AI, rather than unleashing an AI for anyone to play around with, perhaps having been burnt by the inauspicious launch of its ChatGPT rival, Google Bard. However, it also announced PaLM 2 at Google I/O, which is its successor, and supposedly more powerful than other LLMs, so we’ll have to watch this space.