ElevenLabs API logo

ElevenLabs API

api

by elevenlabs

Programmatic access to the most realistic AI audio platform. Text to speech, speech to text, voice cloning, dubbing, sound effects, music generation, and conversational AI agents, all via REST API, WebSocket, and official SDKs.

Key features

Text to Speech API
WebSocket Streaming
Speech to Text (Scribe) API
Voice Cloning API
Voice Changer (Speech to Speech) API
Sound Effects API
Pricing

Free tier available, Creator at $22/mo

Best For

Developers building voice enabled applications, chatbots, or AI agents that need the most natural sounding text to speech API available

Verdict

The most natural sounding TTS API available. Eleven v3 and Multilingual v2 produce speech that is nearly indistinguishable from human voice in most cases.

What it does

Text to Speech API

Convert text to natural speech via POST /v1/text-to-speech/{voice_id}. Choose from Eleven v3 (most expressive, 70+ languages), Multilingual v2 (most stable, 29 languages), Flash v2.5 (fastest, ~75ms latency, 32 languages), or Turbo v2.5 (balanced quality and speed). Supports multiple output formats including MP3, PCM, WAV, OGG, and mu law for Twilio. Character limits range from 5,000 (v3) to 40,000 (Flash/Turbo) per request.

Learn more

WebSocket Streaming

Real time text to speech streaming via WebSocket at /v1/text-to-speech/{voice_id}/stream-input. Send text chunks as they are generated (e.g., from an LLM) and receive audio chunks back immediately. Supports auto mode for automatic chunking. Essential for conversational AI, live narration, and any application where latency matters.

Learn more

Speech to Text (Scribe) API

Transcribe audio via POST /v1/speech-to-text. Scribe v2 supports 90+ languages with word level timestamps, speaker diarization (up to 32 speakers), entity detection, keyterm prompting (up to 100 terms), and dynamic audio tagging. Scribe v2 Realtime provides WebSocket streaming transcription with ~150ms latency for live applications.

Learn more

Voice Cloning API

Create custom voices programmatically. Instant cloning via POST /v1/voices/add accepts a single audio file (30 seconds minimum) and returns a usable voice ID within seconds. Professional cloning uses multiple longer samples for higher fidelity. Cloned voices work with all TTS models and support all languages.

Voice Changer (Speech to Speech) API

Transform the voice in an audio file while preserving the original speech content and timing. Uses the eleven_multilingual_sts_v2 or eleven_english_sts_v2 models. Useful for content creators, privacy applications, and character voice production.

Sound Effects API

Generate custom sound effects from text descriptions via the text_to_sound_v2 model. Describe the sound you need in natural language and receive generated audio. Royalty free output in MP3 (44.1kHz) or WAV (48kHz).

Music Generation API

Generate studio quality music from text prompts using the music_v1 model. Control genre, style, structure, and instrumentation. Supports vocals in multiple languages or instrumental only. Up to 5 minute duration with section level editing. Commercial use available on Starter plans and above.

Learn more

Dubbing API

Automatically dub audio and video content into 29 languages while preserving the original speaker's voice. Submit media via URL or file upload, specify target languages, and receive dubbed output. Supports MP3, MP4, WAV, and MOV formats.

Conversational AI Agents API

Build and deploy real time voice agents via the ElevenAgents platform. WebSocket based API enables bidirectional audio streaming for natural conversations with sub 300ms latency. Supports tool calling, context management, and handoff to human agents. SDKs available for Python, JavaScript, React, and Swift.

Learn more

Pronunciation Dictionaries

Define custom pronunciation rules for names, acronyms, technical terms, and brand names. Attach up to 3 pronunciation dictionaries per TTS request to ensure consistent, correct pronunciation across all generated audio.

Pricing

Free

Free

10,000 credits per month. API access included. Good for prototyping and testing all endpoints.

  • 10,000 credits per month (~10 min TTS with Multilingual v2)
  • ~20 min TTS with Flash models
  • 2.5 hours speech to text
  • API key access
  • Up to 3 custom voices
  • All core API endpoints
  • 2 concurrent requests (Multilingual), 4 (Flash/Turbo)

Starter

$5/month

30,000 credits per month. Instant voice cloning and commercial license for API usage.

  • 30,000 credits per month (~30 min Multilingual, ~60 min Flash)
  • 12.5 hours speech to text
  • Instant voice cloning via API
  • Commercial license
  • Up to 10 custom voices
  • 3 concurrent requests (Multilingual), 6 (Flash/Turbo)
Best Value

Creator

$22/month

100,000 credits per month. Professional voice cloning and higher rate limits.

  • 100,000 credits per month (~100 min Multilingual, ~200 min Flash)
  • ~63 hours speech to text
  • Professional voice cloning
  • 192kbps audio quality via API
  • Up to 30 custom voices
  • 5 concurrent requests (Multilingual), 10 (Flash/Turbo)
Most popular

Pro

$99/month

500,000 credits per month. 44.1kHz PCM audio output and higher concurrency.

  • 500,000 credits per month (~500 min Multilingual, ~1,000 min Flash)
  • ~300 hours speech to text
  • 44.1kHz PCM audio output via API
  • Up to 160 custom voices
  • 10 concurrent requests (Multilingual), 20 (Flash/Turbo)
  • Priority queue (level 5)

Scale

$330/month

2,000,000 credits per month. Team collaboration with 3 workspace seats.

  • 2,000,000 credits per month (~2,000 min Multilingual, ~4,000 min Flash)
  • ~1,100 hours speech to text
  • 3 workspace seats included
  • Up to 660 custom voices
  • 15 concurrent requests (Multilingual), 30 (Flash/Turbo)
  • Priority queue (level 5)

Business

$1320/month

11,000,000 credits per month. Lowest per unit costs and 5 workspace seats.

  • 11,000,000 credits per month (~11,000 min Multilingual, ~22,000 min Flash)
  • ~6,000 hours speech to text
  • 5 workspace seats included
  • Lowest API pricing per character/minute
  • 15 concurrent requests (Multilingual), 30 (Flash/Turbo)
  • 3 professional voice clones included

Enterprise

Custom

Custom pricing with elevated concurrency, custom SSO, SLAs, DPA, BAAs for HIPAA, and dedicated support.

  • Custom credit allocation
  • Elevated concurrency limits
  • Custom SSO and admin controls
  • DPA and SLA agreements
  • BAAs for HIPAA customers
  • ElevenStudios fully managed dubbing
  • Significant volume discounts
  • Priority support and dedicated account manager

Pros & Cons

Pros

  • The most natural sounding TTS API available. Eleven v3 and Multilingual v2 produce speech that is nearly indistinguishable from human voice in most cases.
  • Comprehensive API surface: TTS, STT, voice cloning, dubbing, sound effects, music, voice changer, voice isolator, and conversational AI agents all accessible from a single API key.
  • Ultra low latency WebSocket streaming (~75ms with Flash v2.5) makes it suitable for real time conversational applications and live voice agents.
  • Official SDKs for Python, TypeScript, React, Swift, and Unity (C#) with clean, well documented interfaces. Getting started takes minutes.
  • Flexible model selection: choose between highest quality (v3/Multilingual v2), lowest latency (Flash v2.5), or balanced (Turbo v2.5) depending on your use case.
  • Voice cloning from a single 30 second audio sample gives developers programmatic access to create custom voices for their applications.
  • Credit based pricing with tiered plans means you can start free and scale predictably. Flash models cost roughly half the credits of premium models.

Cons

  • Character based pricing gets expensive at high volumes. At the Business tier ($1,320/mo), TTS costs $0.12 per 1K characters for Multilingual v2, which is significantly more than Amazon Polly or Google Cloud TTS.
  • Eleven v3 (the best model) has a 5,000 character limit per request, requiring text splitting for longer content. Flash and Turbo models allow up to 40,000 characters.
  • Concurrency limits are relatively low on lower tiers (2 concurrent Multilingual v2 requests on Free, 3 on Starter). High traffic applications need Scale tier or above.
  • Credits do not roll over beyond two months, and unused credits expire if you downgrade or cancel your subscription.
  • Text normalization (numbers, dates, currencies) is disabled by default on Flash v2.5 to maintain low latency. Enabling it requires Enterprise plan or pre processing text before sending to the API.
  • Enterprise and ElevenAgents pricing for conversational AI is not publicly transparent. Building production voice agents at scale requires contacting sales.

How to get started

1

Get your API key

Create a free ElevenLabs account and navigate to Settings > API Keys in the dashboard. Copy your API key and store it as an environment variable (ELEVENLABS_API_KEY). The free tier gives you 10,000 credits per month to test all API endpoints.

2

Install the SDK

Install the official SDK for your language. Python: pip install elevenlabs. TypeScript/Node: npm install elevenlabs. The SDKs handle authentication, request formatting, and response parsing. Also install python-dotenv or dotenv for environment variable management.

3

Make your first TTS request

Initialize the client with your API key and call text_to_speech.convert() with text, a voice_id, and a model_id. The default model is eleven_multilingual_v2. For lowest latency, use eleven_flash_v2_5. The response is an audio stream you can play or save to a file.

4

Browse available voices

Use GET /v1/voices to list all available voices, or browse the voice library at elevenlabs.io/voice-library. Each voice has a unique voice_id you pass to the TTS endpoint. Try different voices to find the right match for your application. You can also clone a voice by uploading audio via POST /v1/voices/add.

5

Explore advanced endpoints

Try WebSocket streaming for real time TTS, Scribe v2 for speech to text transcription, the conversational AI agents API for building voice bots, or the dubbing API for video localization. The API reference covers every endpoint with request/response examples.

Deep dive

Detailed guides with comparisons, tips, and visuals for each feature.

Get notified about updates

We'll email you when this tool's pricing or features change.

Last updated: 2026-02-21