Input Modalities
Seedance 2.0 accepts four types of input simultaneously, with the following limits per generation:
- Text: A descriptive prompt guiding the scene, action, style, and composition of the output video.
- Images: Up to 9 reference images. These can be character references, background plates, style guides, or any visual material you want the model to incorporate.
- Video clips: Up to 3 reference clips totaling 15 seconds. Useful for providing motion references, style templates, or continuation footage.
- Audio files: Up to 3 audio files totaling 15 seconds. The model can synchronize the generated video to the audio, including lip sync for speech.
You can use up to 12 reference files total (across all modalities) in a single generation request.
The @Tag System
The @tag system is what makes Seedance 2.0's multimodal input practical rather than chaotic. When you upload multiple reference files, you can assign each one a tag (e.g., @character1, @background, @style) and then reference those tags in your text prompt. This tells the model exactly which reference should influence which part of the output.
For example, you might upload a photo of a person tagged as @speaker, an audio file tagged as @dialogue, and a landscape photo tagged as @setting, then write a prompt like: "@speaker stands in @setting and delivers @dialogue with dramatic lighting." The model uses each reference for its designated purpose rather than blending them indiscriminately.
Output Specifications
- Resolution: Up to 2K
- Duration: 4 to 15 seconds per clip
- Frame rate: 24 FPS
- Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
Native Audio and Lip Sync
Seedance 2.0 generates audio natively as part of the video output. When the scene includes speech, the model produces lip synced audio in 9+ languages. This eliminates the separate audio post production step that other video generators require. You can also provide your own audio reference files for the model to synchronize against.
Editing Tools
Beyond generation, Seedance 2.0 includes several editing capabilities:
- Extend: Lengthen an existing clip beyond its original duration.
- Merge: Combine multiple generated clips into a seamless sequence.
- Restyle: Apply a new visual style or aesthetic to existing footage.
- Character swap: Replace a character in a generated video with a different reference.