LLM Inference API

API access to foundation models.

Access GPT-4, Claude, LLaMA, Mistral, and Gemini through a unified API. Auto-scaling, prompt caching, and cost optimization.

Get API key View docs

20+

Models

< 200 ms

Latency

No training

Privacy

OpenAI-compat

API

Every model, one API.

20+ LLMs. Sub-200ms TTFT. OpenAI-compatible.

20+ foundation models

Access GPT-4, Claude, LLaMA 3, Mistral, Gemini Pro through one API.

Sub-200ms TTFT

Time-to-first-token under 200ms with prompt caching.

Data privacy

Your data is never used for model training. SOC 2 Type II.

OpenAI-compatible

Drop-in replacement for OpenAI API. Same SDK.

Smart routing

Auto-route to best model based on cost, latency, and quality.

Prompt caching

Cache repeated prompt prefixes for 75% cost reduction.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal

ur ai key create my-key \
  --models=gpt4,claude,llama3

LLM patterns.

Chatbots and document analysis.

AI chatbot

Build conversational AI with multiple model fallbacks.

View tutorial

Suggested configuration

GPT-4 · Claude · Smart routing

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $54.20/mo

LLM Endpoint

Model

Usage Volume

Tokens (M/mo)

Infrastructure

Replica Count

Model / Data Storage (GB)

Options

Premium SLA (99.99%)+25% for guaranteed availability

Config 1 cost$54.20

Cost details

$54.20

Unified API. Prompt caching. Cost optimization router.

Configuration 1

$54.20

2× standard Replica(s)$29.20

Request Processing$20.00

Storage$5.00

Works seamlessly with

RAG

Agent

Code Gen

IAM

Logging

Analytics

Frequently asked questions

Every model, one API.

20+ LLMs. Sub-200ms TTFT. OpenAI-compatible.

Get API key View docs

API access to foundation models.

Every model, one API.

20+ foundation models

Sub-200ms TTFT

Data privacy

OpenAI-compatible

Smart routing

Prompt caching

Getting started

Get API key

Call API

Monitor

LLM patterns.

AI chatbot

Estimate your costs

Configuration 1

LLM Endpoint

Usage Volume

Infrastructure

Options

Cost details

Frequently asked questions

Every model, one API.