AI Superpod Clusters

Ultra-massive GPU arrays.

Pre-built clusters of hundreds to thousands of interconnected NVIDIA H100/H200 GPUs. InfiniBand fabric, parallel storage, and job scheduling — ready to train the largest models.

Request access Contact sales

Up to 16,384 GPUs

Scale

400 Gbps

InfiniBand

1T+ parameters

Ready for

99.9%

Uptime SLA

Machine families

Purpose-built configurations for every workload profile — from web serving to GPU-accelerated ML training.

SP-H200 / SP-H100

Superpod Configurations

Pre-configured GPU superpods with optimized network topology, parallel storage, and job scheduling for training the world's largest AI models.

Foundation model trainingMulti-modal modelsScientific AISovereign AI

View all configurations

GPU

H200 / H100

Max GPUs

16,384

InfiniBand

Fat tree NDR

Storage

DAOS / Lustre

Infrastructure for frontier AI.

Turn-key GPU superpods with optimized networking, storage, and scheduling.

Up to 16,384 GPUs

Pre-built superpod clusters from 256 to 16,384 GPUs with optimized fat-tree InfiniBand topology for all-to-all communication.

InfiniBand fat-tree fabric

400 Gbps NDR InfiniBand with non-blocking fat-tree topology. Sub-microsecond MPI latency across the entire cluster.

Parallel storage system

DAOS or Lustre file system delivering 1+ TB/s aggregate read throughput for checkpoint I/O and training data.

Training frameworks

Pre-tuned Megatron-LM, DeepSpeed, and FSDP configurations. NCCL topology files generated for your cluster.

Automatic health monitoring

GPU health checks, InfiniBand link monitoring, and automatic node replacement for failed GPUs.

Job scheduling

Managed Slurm scheduler with priority queues, job preemption, and multi-tenant access control.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal

ur superpod request \
  --config=sp-h200-256 \
  --duration=3months \
  --storage=daos-500tb

Train the largest models.

Foundation models, multi-modal AI, and sovereign AI infrastructure.

Train foundation models

Train GPT, Llama, and Mixtral-class models from scratch. 256-4096 GPU clusters with optimized 3D parallelism.

View tutorial

Suggested configuration

SP-H200-1024 · Megatron-LM · DAOS

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $830.72/mo

Platform & Architecture

Processor Family

Guaranteed Performance

Compute Resources

GPU Count

GPU Memory (GB)

Storage

Cluster Storage

Disk Size (GB)

Cost Optimization

Spot GPU ClusterSave up to 70% — may be reclaimed

Config 1 cost$830.72

Cost details

$830.72

InfiniBand networking included. Volume discounts for 100+ GPUs.

Configuration 1

$830.72

8 vCPU × 80 GB Compute$630.72

Persistent Storage$200.00

Works seamlessly with

GPU Instances

Parallel Storage

Model Registry

Cloud Monitoring

IAM

Cloud Logging

Frequently asked questions

Train at any scale.

Pre-built GPU superpods from 256 to 16,384 GPUs. Request access today.

Request access Contact sales

Ultra-massive GPU arrays.

Machine families

Superpod Configurations

Infrastructure for frontier AI.

Up to 16,384 GPUs

InfiniBand fat-tree fabric

Parallel storage system

Training frameworks

Automatic health monitoring

Job scheduling

Getting started

Request a superpod

Set up your environment

Launch training

Train the largest models.

Train foundation models

Estimate your costs

Configuration 1

Platform & Architecture

Compute Resources

Storage

Cost Optimization

Cost details

Frequently asked questions

Train at any scale.