Beyond TTS: STT, Music, Sound Effects, and Dubbing

All plans (feature availability varies by tier)2 min read
🎤
Scribe v2 (STT)

90+ languages, word level timestamps, speaker diarization (32 speakers), entity detection. Batch and real time WebSocket modes.

🎵
Music Generation

Text to music with genre, style, and structure control. Vocals in multiple languages. Section level editing. Up to 5 minutes.

🔊
Sound Effects

Text to sound effects. Describe what you need in natural language. Royalty free MP3 or WAV output.

🌍
Dubbing

Automatic video/audio dubbing in 29 languages. Preserves original speaker voice. Supports MP4, WAV, MOV, MP3.

🔇
Voice Isolator

Remove background noise and reverb. Accepts files up to 500MB/1 hour. WAV, MP3, FLAC, OGG, AAC inputs.

🔄
Voice Changer

Speech to speech voice transformation. Apply any voice to existing audio while preserving content and timing.