ElevenLabs API
apiby elevenlabs
Programmatic access to the most realistic AI audio platform. Text to speech, speech to text, voice cloning, dubbing, sound effects, music generation, and conversational AI agents, all via REST API, WebSocket, and official SDKs.
Key features
Free tier available, Creator at $22/mo
Developers building voice enabled applications, chatbots, or AI agents that need the most natural sounding text to speech API available
The most natural sounding TTS API available. Eleven v3 and Multilingual v2 produce speech that is nearly indistinguishable from human voice in most cases.
What it does
Text to Speech API
Convert text to natural speech via POST /v1/text-to-speech/{voice_id}. Choose from Eleven v3 (most expressive, 70+ languages), Multilingual v2 (most stable, 29 languages), Flash v2.5 (fastest, ~75ms latency, 32 languages), or Turbo v2.5 (balanced quality and speed). Supports multiple output formats including MP3, PCM, WAV, OGG, and mu law for Twilio. Character limits range from 5,000 (v3) to 40,000 (Flash/Turbo) per request.
Learn moreWebSocket Streaming
Real time text to speech streaming via WebSocket at /v1/text-to-speech/{voice_id}/stream-input. Send text chunks as they are generated (e.g., from an LLM) and receive audio chunks back immediately. Supports auto mode for automatic chunking. Essential for conversational AI, live narration, and any application where latency matters.
Learn moreSpeech to Text (Scribe) API
Transcribe audio via POST /v1/speech-to-text. Scribe v2 supports 90+ languages with word level timestamps, speaker diarization (up to 32 speakers), entity detection, keyterm prompting (up to 100 terms), and dynamic audio tagging. Scribe v2 Realtime provides WebSocket streaming transcription with ~150ms latency for live applications.
Learn moreVoice Cloning API
Create custom voices programmatically. Instant cloning via POST /v1/voices/add accepts a single audio file (30 seconds minimum) and returns a usable voice ID within seconds. Professional cloning uses multiple longer samples for higher fidelity. Cloned voices work with all TTS models and support all languages.
Voice Changer (Speech to Speech) API
Transform the voice in an audio file while preserving the original speech content and timing. Uses the eleven_multilingual_sts_v2 or eleven_english_sts_v2 models. Useful for content creators, privacy applications, and character voice production.
Sound Effects API
Generate custom sound effects from text descriptions via the text_to_sound_v2 model. Describe the sound you need in natural language and receive generated audio. Royalty free output in MP3 (44.1kHz) or WAV (48kHz).
Music Generation API
Generate studio quality music from text prompts using the music_v1 model. Control genre, style, structure, and instrumentation. Supports vocals in multiple languages or instrumental only. Up to 5 minute duration with section level editing. Commercial use available on Starter plans and above.
Learn moreDubbing API
Automatically dub audio and video content into 29 languages while preserving the original speaker's voice. Submit media via URL or file upload, specify target languages, and receive dubbed output. Supports MP3, MP4, WAV, and MOV formats.
Conversational AI Agents API
Build and deploy real time voice agents via the ElevenAgents platform. WebSocket based API enables bidirectional audio streaming for natural conversations with sub 300ms latency. Supports tool calling, context management, and handoff to human agents. SDKs available for Python, JavaScript, React, and Swift.
Learn morePronunciation Dictionaries
Define custom pronunciation rules for names, acronyms, technical terms, and brand names. Attach up to 3 pronunciation dictionaries per TTS request to ensure consistent, correct pronunciation across all generated audio.
Pricing
Free
10,000 credits per month. API access included. Good for prototyping and testing all endpoints.
- 10,000 credits per month (~10 min TTS with Multilingual v2)
- ~20 min TTS with Flash models
- 2.5 hours speech to text
- API key access
- Up to 3 custom voices
- All core API endpoints
- 2 concurrent requests (Multilingual), 4 (Flash/Turbo)
Starter
30,000 credits per month. Instant voice cloning and commercial license for API usage.
- 30,000 credits per month (~30 min Multilingual, ~60 min Flash)
- 12.5 hours speech to text
- Instant voice cloning via API
- Commercial license
- Up to 10 custom voices
- 3 concurrent requests (Multilingual), 6 (Flash/Turbo)
Creator
100,000 credits per month. Professional voice cloning and higher rate limits.
- 100,000 credits per month (~100 min Multilingual, ~200 min Flash)
- ~63 hours speech to text
- Professional voice cloning
- 192kbps audio quality via API
- Up to 30 custom voices
- 5 concurrent requests (Multilingual), 10 (Flash/Turbo)
Pro
500,000 credits per month. 44.1kHz PCM audio output and higher concurrency.
- 500,000 credits per month (~500 min Multilingual, ~1,000 min Flash)
- ~300 hours speech to text
- 44.1kHz PCM audio output via API
- Up to 160 custom voices
- 10 concurrent requests (Multilingual), 20 (Flash/Turbo)
- Priority queue (level 5)
Scale
2,000,000 credits per month. Team collaboration with 3 workspace seats.
- 2,000,000 credits per month (~2,000 min Multilingual, ~4,000 min Flash)
- ~1,100 hours speech to text
- 3 workspace seats included
- Up to 660 custom voices
- 15 concurrent requests (Multilingual), 30 (Flash/Turbo)
- Priority queue (level 5)
Business
11,000,000 credits per month. Lowest per unit costs and 5 workspace seats.
- 11,000,000 credits per month (~11,000 min Multilingual, ~22,000 min Flash)
- ~6,000 hours speech to text
- 5 workspace seats included
- Lowest API pricing per character/minute
- 15 concurrent requests (Multilingual), 30 (Flash/Turbo)
- 3 professional voice clones included
Enterprise
Custom pricing with elevated concurrency, custom SSO, SLAs, DPA, BAAs for HIPAA, and dedicated support.
- Custom credit allocation
- Elevated concurrency limits
- Custom SSO and admin controls
- DPA and SLA agreements
- BAAs for HIPAA customers
- ElevenStudios fully managed dubbing
- Significant volume discounts
- Priority support and dedicated account manager
Pros & Cons
Pros
- The most natural sounding TTS API available. Eleven v3 and Multilingual v2 produce speech that is nearly indistinguishable from human voice in most cases.
- Comprehensive API surface: TTS, STT, voice cloning, dubbing, sound effects, music, voice changer, voice isolator, and conversational AI agents all accessible from a single API key.
- Ultra low latency WebSocket streaming (~75ms with Flash v2.5) makes it suitable for real time conversational applications and live voice agents.
- Official SDKs for Python, TypeScript, React, Swift, and Unity (C#) with clean, well documented interfaces. Getting started takes minutes.
- Flexible model selection: choose between highest quality (v3/Multilingual v2), lowest latency (Flash v2.5), or balanced (Turbo v2.5) depending on your use case.
- Voice cloning from a single 30 second audio sample gives developers programmatic access to create custom voices for their applications.
- Credit based pricing with tiered plans means you can start free and scale predictably. Flash models cost roughly half the credits of premium models.
Cons
- Character based pricing gets expensive at high volumes. At the Business tier ($1,320/mo), TTS costs $0.12 per 1K characters for Multilingual v2, which is significantly more than Amazon Polly or Google Cloud TTS.
- Eleven v3 (the best model) has a 5,000 character limit per request, requiring text splitting for longer content. Flash and Turbo models allow up to 40,000 characters.
- Concurrency limits are relatively low on lower tiers (2 concurrent Multilingual v2 requests on Free, 3 on Starter). High traffic applications need Scale tier or above.
- Credits do not roll over beyond two months, and unused credits expire if you downgrade or cancel your subscription.
- Text normalization (numbers, dates, currencies) is disabled by default on Flash v2.5 to maintain low latency. Enabling it requires Enterprise plan or pre processing text before sending to the API.
- Enterprise and ElevenAgents pricing for conversational AI is not publicly transparent. Building production voice agents at scale requires contacting sales.
How to get started
Get your API key
Create a free ElevenLabs account and navigate to Settings > API Keys in the dashboard. Copy your API key and store it as an environment variable (ELEVENLABS_API_KEY). The free tier gives you 10,000 credits per month to test all API endpoints.
Install the SDK
Install the official SDK for your language. Python: pip install elevenlabs. TypeScript/Node: npm install elevenlabs. The SDKs handle authentication, request formatting, and response parsing. Also install python-dotenv or dotenv for environment variable management.
Make your first TTS request
Initialize the client with your API key and call text_to_speech.convert() with text, a voice_id, and a model_id. The default model is eleven_multilingual_v2. For lowest latency, use eleven_flash_v2_5. The response is an audio stream you can play or save to a file.
Browse available voices
Use GET /v1/voices to list all available voices, or browse the voice library at elevenlabs.io/voice-library. Each voice has a unique voice_id you pass to the TTS endpoint. Try different voices to find the right match for your application. You can also clone a voice by uploading audio via POST /v1/voices/add.
Explore advanced endpoints
Try WebSocket streaming for real time TTS, Scribe v2 for speech to text transcription, the conversational AI agents API for building voice bots, or the dubbing API for video localization. The API reference covers every endpoint with request/response examples.
Deep dive
Detailed guides with comparisons, tips, and visuals for each feature.
TTS Models: Choosing the Right One
A detailed comparison of Eleven v3, Multilingual v2, Flash v2.5, and Turbo v2.5 to help you pick the right model for your API integration.
Beyond TTS: STT, Music, Sound Effects, and Dubbing
The ElevenLabs API is more than text to speech. Explore the speech to text, music generation, sound effects, voice isolation, and dubbing endpoints.
Building Conversational AI Agents
How to use the ElevenAgents WebSocket API to build real time voice agents for customer support, sales, and interactive experiences.
API Pricing and Credit System Explained
How the credit system works, what each model costs per character or minute, overage rates, and tips for optimizing API spend.
Links
Apps
Official
Pricing
Similar Tools
ElevenLabs
voiceelevenlabs
The most realistic AI voice platform. Text to speech, voice cloning, dubbing, sound effects, music generation, and conversational AI agents.
Firecrawl
apifirecrawl
Turn websites into LLM-ready data. API for scraping, crawling, searching, and extracting structured content from any URL, purpose built for AI applications.
GitHub
apigithub
The world's largest code hosting platform with 100M+ developers. Repositories, Actions CI/CD, Pages, Packages, and an official MCP server that gives AI agents full access to the GitHub ecosystem.
Get notified about updates
We'll email you when this tool's pricing or features change.
Last updated: 2026-02-21