Scribe v2 (STT)90+ languages, word level timestamps, speaker diarization (32 speakers), entity detection. Batch and real time WebSocket modes.
Music GenerationText to music with genre, style, and structure control. Vocals in multiple languages. Section level editing. Up to 5 minutes.
Sound EffectsText to sound effects. Describe what you need in natural language. Royalty free MP3 or WAV output.
DubbingAutomatic video/audio dubbing in 29 languages. Preserves original speaker voice. Supports MP4, WAV, MOV, MP3.
Voice IsolatorRemove background noise and reverb. Accepts files up to 500MB/1 hour. WAV, MP3, FLAC, OGG, AAC inputs.
Voice ChangerSpeech to speech voice transformation. Apply any voice to existing audio while preserving content and timing.