Behind every AI breakthrough is an enormous infrastructure story. Training GPT-4 reportedly cost over $100 million in compute alone. Understanding the hardware, cloud platforms, and optimisation techniques that power modern AI is essential for anyone building at scale.
CPUs are optimised for sequential tasks โ a few powerful cores handling complex logic. GPUs are optimised for parallelism โ thousands of simple cores executing the same operation on different data simultaneously.
Neural network training is fundamentally matrix multiplication at scale. A single training step for a large model involves multiplying matrices with billions of elements. A CPU processes these sequentially. A GPU processes thousands of elements in parallel.
An NVIDIA H100 GPU delivers roughly 4,000 TFLOPS of FP8 compute. A high-end server CPU manages perhaps 2 TFLOPS. For the embarrassingly parallel workloads of deep learning, GPUs win by three orders of magnitude.
NVIDIA controls roughly 80-90% of the AI accelerator market. Their moat is not just hardware โ it is the CUDA ecosystem. Nearly every ML framework (PyTorch, TensorFlow, JAX) is built on CUDA. Switching to a competitor means rewriting low-level kernels and accepting potential incompatibilities.
| GPU | Memory | FP8 Performance | Use Case | |-----|--------|-----------------|----------| | A100 | 80 GB HBM2e | 624 TFLOPS | Workhorse of current data centres | | H100 | 80 GB HBM3 | 3,958 TFLOPS | Frontier model training | | H200 | 141 GB HBM3e | 3,958 TFLOPS | Memory-bound LLM inference | | B200 | 192 GB HBM3e | 9,000 TFLOPS | Next-generation training and inference |
The H100 to B200 jump represents a 2.3ร performance increase โ but the real bottleneck is often memory bandwidth, not raw compute. The H200's 141 GB of HBM3e memory specifically targets LLM inference where the KV cache dominates memory usage.
Google's Tensor Processing Units are custom ASICs designed specifically for matrix operations. Unlike GPUs, which handle graphics and compute, TPUs are purpose-built for neural networks.
Google trains all its foundation models (Gemini, PaLM) on TPUs, proving they compete with NVIDIA at the frontier.
AMD's MI300X is the most credible NVIDIA alternative:
Microsoft, Meta, and Oracle have all adopted MI300X for inference workloads. The memory advantage is particularly compelling for large language models where the KV cache is the primary bottleneck.
Intel's Gaudi 3 accelerator is also entering the mix, offering competitive training performance with a focus on enterprise price-to-performance ratios. The AI accelerator market is finally becoming a multi-vendor ecosystem.
Why is NVIDIA's competitive moat not just about hardware performance?
No company outside the hyperscalers can afford to build its own GPU cluster for frontier AI. The cloud battle is fierce:
| Platform | Key Offering | Strength | |----------|-------------|----------| | AWS SageMaker | End-to-end ML platform | Broadest GPU selection, Inferentia custom chips | | GCP Vertex AI | Managed ML with TPU access | TPU availability, tight BigQuery integration | | Azure ML | Enterprise ML platform | OpenAI partnership, enterprise compliance | | Lambda Labs | GPU cloud for AI | Simplicity, competitive H100 pricing | | CoreWeave | GPU-native cloud | Purpose-built for AI, NVIDIA partnership |
Spot instances can reduce training costs by 60-90%. The trade-off: your training job can be interrupted at any time. Frameworks like PyTorch Lightning and DeepSpeed support checkpointing to resume training seamlessly after preemption.
AI infrastructure is now a geopolitical issue:
Several startups are challenging the GPU paradigm entirely:
These are not NVIDIA replacements today. But they represent genuine architectural innovation that could reshape the market as AI workloads diversify beyond transformers.
What is the primary bottleneck for LLM inference that newer GPU designs (H200, MI300X) are specifically addressing?
Training gets the headlines, but inference is where the money goes. Techniques that reduce inference cost:
How does speculative decoding accelerate LLM inference?
Infrastructure cost extends far beyond GPU rental. A complete picture includes:
At scale, companies like Meta spend over $10 billion annually on AI infrastructure. For startups, cloud costs for a single frontier model training run can exceed $1 million. Understanding and optimising this cost stack is a core competency.