1
Provide input images
Drag images into the Subject, Scene, and Style slots. Fill one, two, or all three depending on your creative intent.
2
Gemini generates captions
Gemini analyzes each input image and produces detailed text descriptions capturing the essence of subject, scene, and style.
3
Review and edit prompts
View the auto generated captions and refine them. Adjust wording, add details, or change the action while preserving the core references.
4
Imagen 3 generates output
The combined prompt is sent to Imagen 3, which produces a remixed image blending all three input dimensions together.
5
Iterate or download
Review the result. Swap inputs, edit prompts, or regenerate for variations. Download when satisfied.
SubjectThe main focus of the image: a person, animal, object, or character. Whisk extracts core identity and appearance.
SceneThe environment or setting: a landscape, room, cityscape, or abstract background. Sets spatial context and lighting.
StyleThe visual aesthetic: watercolor, pixel art, photography, illustration. Controls how the final image looks and feels.
Essence, not exact replication
Whisk is designed to capture the essence and feel of your input images, not produce pixel perfect reproductions. A photo of your dog as the subject will produce an image that looks like your dog in spirit, but fine details like collar color or exact markings may vary. Think of Whisk as a creative remixing tool rather than a precision editing tool. This design choice is intentional and enables the fluid creative exploration that makes Whisk unique.
Getting the best results
Use clear, well lit input images with strong visual identity. For subjects, close up photos with a clean background work best. For scenes, choose images with distinctive environmental characteristics. For styles, pick images with a strong and consistent visual aesthetic. If the auto generated captions miss something important, edit them before generating to guide Imagen 3 more precisely.