Model Training Studio

Managed environments for ML.

End-to-end ML training platform. Distributed training on GPU clusters, experiment tracking, hyperparameter tuning, and model versioning.

Start training View docs

Any

Framework

1000+ GPUs

Scale

Auto

Tracking

HPO

Tuning

Train at scale.

1000+ GPUs. Any framework.

Any framework

PyTorch, TensorFlow, JAX, and custom frameworks.

1000+ GPU clusters

Distributed training across 1000+ GPUs with NCCL/FSDP.

Experiment tracking

Automatic tracking of metrics, hyperparams, and artifacts.

Hyperparameter tuning

Bayesian optimization, grid search, and random search.

Model versioning

Version control for models with lineage tracking.

Data pipelines

Streaming data loaders with prefetching and caching.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal

ur ml experiment create bert-ft \
  --framework=pytorch \
  --gpus=4 --gpu-type=a100

Training patterns.

LLM fine-tuning and ML research.

LLM fine-tuning

Fine-tune foundation models on custom data.

View tutorial

Suggested configuration

Multi-GPU · FSDP · Auto-track

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $39.20/mo

Training Configuration

Compute Tier

Usage Volume

Training Hours/mo

hrs

Infrastructure

Replica Count

Model / Data Storage (GB)

Options

Premium SLA (99.99%)+25% for guaranteed availability

Config 1 cost$39.20

Cost details

$39.20

Distributed training. Hyperparameter tuning. Experiment tracking.

Configuration 1

$39.20

2× standard Replica(s)$29.20

Request Processing$5.00

Storage$5.00

Works seamlessly with

Model Registry

MLOps

S3 Data

IAM

Monitoring

Tracking

Frequently asked questions

Train at scale.

1000+ GPUs. Any framework. Auto-tracking.

Start training View docs

Managed environments for ML.

Train at scale.

Any framework

1000+ GPU clusters

Experiment tracking

Hyperparameter tuning

Model versioning

Data pipelines

Getting started

Create experiment

Train

Tune

Training patterns.

LLM fine-tuning

Estimate your costs

Configuration 1

Training Configuration

Usage Volume

Infrastructure

Options

Cost details

Frequently asked questions

Train at scale.