Beyond TTS: STT, Music, Sound Effects, and Dubbing | ElevenLabs API

🎤

Scribe v2 (STT)90+ languages, word level timestamps, speaker diarization (32 speakers), entity detection. Batch and real time WebSocket modes.

🎵

Music GenerationText to music with genre, style, and structure control. Vocals in multiple languages. Section level editing. Up to 5 minutes.

🔊

Sound EffectsText to sound effects. Describe what you need in natural language. Royalty free MP3 or WAV output.

🌍

DubbingAutomatic video/audio dubbing in 29 languages. Preserves original speaker voice. Supports MP4, WAV, MOV, MP3.

🔇

Voice IsolatorRemove background noise and reverb. Accepts files up to 500MB/1 hour. WAV, MP3, FLAC, OGG, AAC inputs.

🔄

Voice ChangerSpeech to speech voice transformation. Apply any voice to existing audio while preserving content and timing.