Model Training Studio

Managed environments for ML.

End-to-end ML training platform. Distributed training on GPU clusters, experiment tracking, hyperparameter tuning, and model versioning.

Training DashboardLoss: 0.0042

Any

Framework

1000+ GPUs

Scale

Auto

Tracking

HPO

Tuning

Train at scale.

1000+ GPUs. Any framework.

Any framework

PyTorch, TensorFlow, JAX, and custom frameworks.

1000+ GPU clusters

Distributed training across 1000+ GPUs with NCCL/FSDP.

Experiment tracking

Automatic tracking of metrics, hyperparams, and artifacts.

Hyperparameter tuning

Bayesian optimization, grid search, and random search.

Model versioning

Version control for models with lineage tracking.

Data pipelines

Streaming data loaders with prefetching and caching.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal
ur ml experiment create bert-ft \
  --framework=pytorch \
  --gpus=4 --gpu-type=a100

Training patterns.

LLM fine-tuning and ML research.

LLM fine-tuning

Fine-tune foundation models on custom data.

View tutorial

Suggested configuration

Multi-GPU · FSDP · Auto-track

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $39.20/mo

Training Configuration

Usage Volume

hrs

Infrastructure

GB

Options

Premium SLA (99.99%)+25% for guaranteed availability
Config 1 cost$39.20

Cost details

$39.20

Distributed training. Hyperparameter tuning. Experiment tracking.

Configuration 1
$39.20
2× standard Replica(s)$29.20
Request Processing$5.00
Storage$5.00

Works seamlessly with

Model Registry
MLOps
S3 Data
IAM
Monitoring
Tracking

Frequently asked questions

Train at scale.

1000+ GPUs. Any framework. Auto-tracking.