Step 1: Provide input images
Drag images into the Subject, Scene, and Style slots. Fill one, two, or all three depending on your creative intent.
Step 2: Gemini generates captions
Gemini analyzes each input image and produces detailed text descriptions capturing the essence of subject, scene, and style.
Step 3: Review and edit prompts
View the auto generated captions and refine them. Adjust wording, add details, or change the action while preserving the core references.
Step 4: Imagen 3 generates output
The combined prompt is sent to Imagen 3, which produces a remixed image blending all three input dimensions together.
Step 5: Iterate or download
Review the result. Swap inputs, edit prompts, or regenerate for variations. Download when satisfied.
Subject
The main focus of the image: a person, animal, object, or character. Whisk extracts core identity and appearance.
Scene
The environment or setting: a landscape, room, cityscape, or abstract background. Sets spatial context and lighting.
Style
The visual aesthetic: watercolor, pixel art, photography, illustration. Controls how the final image looks and feels.