Image Remixing: Subject, Scene, and Style | Google Whisk

Step 1: Provide input images

Drag images into the Subject, Scene, and Style slots. Fill one, two, or all three depending on your creative intent.

Step 2: Gemini generates captions

Gemini analyzes each input image and produces detailed text descriptions capturing the essence of subject, scene, and style.

Step 3: Review and edit prompts

View the auto generated captions and refine them. Adjust wording, add details, or change the action while preserving the core references.

Step 4: Imagen 3 generates output

The combined prompt is sent to Imagen 3, which produces a remixed image blending all three input dimensions together.

Step 5: Iterate or download

Review the result. Swap inputs, edit prompts, or regenerate for variations. Download when satisfied.

🎯

Subject

The main focus of the image: a person, animal, object, or character. Whisk extracts core identity and appearance.

🌄

Scene

The environment or setting: a landscape, room, cityscape, or abstract background. Sets spatial context and lighting.

🎨

Style

The visual aesthetic: watercolor, pixel art, photography, illustration. Controls how the final image looks and feels.

Essence, not exact replication

Whisk is designed to capture the essence and feel of your input images, not produce pixel perfect reproductions. A photo of your dog as the subject will produce an image that looks like your dog in spirit, but fine details like collar color or exact markings may vary. Think of Whisk as a creative remixing tool rather than a precision editing tool. This design choice is intentional and enables the fluid creative exploration that makes Whisk unique.

Getting the best results

Use clear, well lit input images with strong visual identity. For subjects, close up photos with a clean background work best. For scenes, choose images with distinctive environmental characteristics. For styles, pick images with a strong and consistent visual aesthetic. If the auto generated captions miss something important, edit them before generating to guide Imagen 3 more precisely.