How GPT-4o Native Image Generation Works | ChatGPT Image Generation

DALL-E 3 (Legacy)

Architecture

Separate diffusion model called by ChatGPT

Text in images

Frequently garbled or misspelled

Multi turn consistency

Each generation independent

Instruction following

Often ignored complex constraints

Context awareness

Only sees the prompt ChatGPT wrote

In context learning

Not supported

GPT-4o Native (Current)

Architecture

Built into the language model natively

Text in images

Accurate, readable text rendering

Multi turn consistency

Characters and styles persist across turns

Instruction following

Reliably follows multi constraint prompts

Context awareness

Sees full conversation, uploads, and history

In context learning

Learns from uploaded reference images

Objects per scene (max reliable)

0/3 hrs

Images on Plus plan

Free tier available

DALL-E 3 is still available

If you prefer the DALL-E 3 aesthetic or need the older model for compatibility, you can still access it through the DALL-E GPT in the GPT Store. Search for "DALL-E" in the store and use the official OpenAI GPT. However, for most use cases, GPT-4o native generation produces superior results.