NVIDIA H200
The NVIDIA H200 extends H100 architecture with 141GB HBM3e memory (1.75× more than H100) and 4.8TB/s bandwidth (1.6× more). Designed for: (1) Training models >100B parameters without model parallelism, (2) Long-context inference (1M+ token context windows), (3) Handling larger batch sizes for inference efficiency. Same compute as H100 but dramatically more memory enables new use cases. Available Q1 2024 on major clouds. Use cases: training/serving models like GPT-4, Claude with extended context, multi-modal models combining vision+language, scientific simulations requiring massive memory.

Overview
H200 solves the memory bottleneck for cutting-edge AI. Train 70B models without model parallelism, serve 100M+ context length inference, run larger batch sizes for better GPU utilization. Same Tensor Cores and Transformer Engine as H100, but 141GB memory vs 80GB opens new possibilities. 4.8TB/s bandwidth (vs 3TB/s H100) reduces memory-bound bottlenecks. Especially valuable for: inference serving (fit more concurrent requests in memory), scientific computing (larger simulations), multi-modal models (process more images/video simultaneously).
Key Specifications
- **Memory**: 141GB HBM3e, 4.8TB/s bandwidth
- **Compute**: Same as H100—1,979 TFLOPS FP8
- **Memory Advantage**: 1.75× more memory than H100
- **Bandwidth**: 1.6× more than H100
- **Power/Form Factor**: Similar to H100 SXM5
- **Availability**: Q1 2024 on AWS, GCP, Azure
Use Cases
- **Long Context Inference**: Serve 1M+ token contexts (Claude, GPT-4 Turbo scale)
- **Larger Batches**: Fit 2× more concurrent requests for inference
- **Training**: 70B-175B models without model parallelism complexity
- **Multi-Modal**: Process more images/video frames simultaneously
- **Scientific Computing**: Molecular dynamics, weather simulation with larger datasets