Beyond TTS: STT, Music, Sound Effects, and Dubbing | ElevenLabs API

🎤

90+ languages, word level timestamps, speaker diarization (32 speakers), entity detection. Batch and real time WebSocket modes.

🎵

Text to music with genre, style, and structure control. Vocals in multiple languages. Section level editing. Up to 5 minutes.

🔊

Text to sound effects. Describe what you need in natural language. Royalty free MP3 or WAV output.

🌍

Automatic video/audio dubbing in 29 languages. Preserves original speaker voice. Supports MP4, WAV, MOV, MP3.

🔇

Remove background noise and reverb. Accepts files up to 500MB/1 hour. WAV, MP3, FLAC, OGG, AAC inputs.

🔄

Speech to speech voice transformation. Apply any voice to existing audio while preserving content and timing.