DALL-E 3 (Legacy)
Architecture
Separate diffusion model called by ChatGPT
Text in images
Frequently garbled or misspelled
Multi turn consistency
Each generation independent
Instruction following
Often ignored complex constraints
Context awareness
Only sees the prompt ChatGPT wrote
In context learning
Not supported
GPT-4o Native (Current)
Architecture
Built into the language model natively
Text in images
Dramatically improved; accurate and readable even at small sizes
Multi turn consistency
Characters and styles persist across turns
Instruction following
Reliably follows multi constraint prompts
Context awareness
Sees full conversation, uploads, and history
In context learning
Learns from uploaded reference images
0%
User preference (vs 62% for DALL-E 3)
0
Objects per scene (max reliable)
0/3 hrs
Images on Plus plan
$0
Free tier available