Speech-To-Text API

Real-time multilingual transcription.

Convert speech to text in 125+ languages. Real-time streaming, speaker diarization, punctuation, and custom vocabulary.

Speaker 1: "Let's review the quarterly results..."Speaker 2: "Revenue grew 23% year over year."

125+

Languages

Real-time

Mode

97%+

Accuracy

Diarization

Speakers

Speech, transcribed.

125+ languages. Real-time streaming.

125+ languages

Multilingual transcription with automatic language detection.

Real-time streaming

Stream audio and get results in real-time via WebSocket.

Speaker diarization

Identify and label different speakers in conversation.

Custom vocabulary

Add domain-specific terms and proper nouns.

Auto-punctuation

Automatic punctuation, capitalization, and formatting.

Batch processing

Transcribe large audio archives asynchronously.

Getting started

Launch your first instance in three steps. CLI, console, or API — your choice.

Terminal
ur ai stt transcribe \
  --file=meeting.mp3 \
  --language=auto --diarize

STT patterns.

Meeting transcription and call analytics.

Meeting transcription

Transcribe meetings with speaker identification.

View tutorial

Suggested configuration

Diarization · Real-time · 125+ lang

Estimate your costs

Create detailed configurations to see exactly how much your architecture will cost. Pay for what you use, down to the second.

Configuration 1

Estimated: $44.20/mo

Speech Recognition

Usage Volume

hrs

Infrastructure

GB

Options

Premium SLA (99.99%)+25% for guaranteed availability
Config 1 cost$44.20

Cost details

$44.20

125+ languages. Speaker diarization. Custom vocabulary.

Configuration 1
$44.20
2× standard Replica(s)$29.20
Request Processing$10.00
Storage$5.00

Works seamlessly with

LLM API
Functions
Translation
IAM
Logging
Analytics

Frequently asked questions

Speech, transcribed.

125+ languages. Real-time. Speaker diarization.