DALL-E 3 (Retired May 2026)
Architecture
Separate diffusion model called by ChatGPT
Text in images
Frequently garbled or misspelled
Multi turn consistency
Each generation independent
Instruction following
Often ignored complex constraints
Context awareness
Only sees the prompt ChatGPT wrote
In context learning
Not supported
ChatGPT Images 2.0 (Current)
Architecture
Built into the language model natively
Text in images
Dramatically improved; accurate and readable even at small sizes
Multi turn consistency
Characters and styles persist across turns
Instruction following
Reliably follows multi constraint prompts
Context awareness
Sees full conversation, uploads, and history
In context learning
Learns from uploaded reference images
0
Images with consistency (Thinking Mode)
0
Objects per scene (max reliable)
$0
New Pro tier (5x Plus)
$0
Free tier (Instant Mode)